Due by 12pm Thursday, Sep 1
In our first class, we used a pretty silly, non-real-life example to explore Git. (This is actually good programming practice—using a sandbox to try out individual skills or test small bits of code, separate from the real file(s) you’re working on!) Soon, however, you’ll be using Git for your own data science projects. Do the following:
Think about what Git can do, and think back to the ways you’ve managed files in the past. Even if you didn’t realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?
Example answer
When I’m writing a paper, I’ll add a versionX
suffix like version0.5
, version0.6
, etc.
Once I feel like I’ve made decent progress, I’ll increment the version number and add a little comment to the top of the file describing the changes since the previous version.
Then I don’t touch the old file, and I only work on the latest version.
Each new version is basically like a Git commit, and the little comment is like a commit message.
In my current system, the little comments aren’t easy to glance over; if I’m looking to undo a previous change, I have to re-open each old version to read through the commits.
Git can do that more easily by letting me see commits and diffs at a glance.
Plus, in my current system there’s no guarantee that I won’t accidentally change an old version rather than the latest version, which would render the “commit message” useless.
Git prevents that from happening because once changes are committed, they’re there.
no changes added to commit (use "git add" and/or "git commit -a")
, what am I doing wrong?”.gitignore
?”Write up your answers in a text file (should have the .txt
extension) or a Markdown file (.md
) if you’re comfortable with Markdown.
Name the file todo1_YOURNAME.txt
or todo1_YOURNAME.md
.
Share it to the #to-dos channel on our Slack workspace.
Due by 12pm Tuesday, September 6
Time for some hands-on practice! Do the following:
Install/update tidyverse
by running install.packages("tidyverse")
in the R console. If you’ve done it correctly, then running packageVersion("tidyverse")
should return ‘1.3.2’
Learn about ggplot2
! Go through the data visualization chapter in our class version of R for data science. (Pay attention to the yellow blocks, where I’ve injected notes for our class into the chapter!)
It’s up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doing—not just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.
ggplot2_notes_YOURNAME.txt
or ggplot2_notes_YOURNAME.Rmd
(if you do an Rmd, no need to knit). Include examples, explanations, etc. You are essentially creating your own reference material.Submission: Share your notes on the #to-dos channel on Slack
Due by 12pm Thursday, September 8
Pull the upstream changes into your local Class-Exercise-Repo
: In the Git tab, click the gear icon then Shell. In the shell, type git pull upstream main
. After class, I modified the upstream so that it should pull into your local Class-Exercise-Repo
; you’ll know if the todo3/
folder shows up in your local repo. If not and you’re still getting the error Your local changes to the following files would be overwritten by merge
, let me know on Slack.
Learn about dplyr
! Like you did for To-do 2, go through the data transformation chapter in our class version of R for data science, and create your own study notes as dplyr_notes_YOURNAME.txt
(or use formats .md
or .Rmd
if you prefer).
What’s your muddiest point for dplyr
? Create a file called dplyr_muddiest_YOURNAME.txt
(or .md
) that has your muddiest point: After going through the dplyr
chapter, what’s the concept or skill that’s giving you the most issues? What are you most unsure about?
Submission: You should have 2 new files: notes and muddiest point.
If you successfully pulled the upstream changes into your local repo, you’ll have a todo3/
folder; stage, commit, push to origin
, and create a pull request.
If git pull upstream main
was unsuccessful, share the files on the Slack #to-dos channel instead and we’ll troubleshoot on Thursday.
Due by 12pm Tuesday, September 13
It’s time to flex our newfound skills with Markdown and R Markdown!
Attempt to pull Class-Exercise-Repo
from upstream. If you get an error message, ask me about it in Slack #q-and-a
Clean up your notes from To-dos 2 & 3. Convert .txt
files w/o R code into .md
, .txt
files with R code into .Rmd
. For both, use text formatting and headings to make your documents clearer. For Rmds, use a YAML header (like this one), R code chunks, and session info.
Try knitting your R Markdown files to GitHub-flavored markdown files. If it takes you more than ~30 minutes to debug, leave it as-is without knitting, and we’ll troubleshoot on Tuesday.
You can now safely delete the .txt
versions—they’re still in your repo history if you ever need to access them again.
Submission: Put your files in the todo2/
and todo3/
directories of the Class-Exercise-Repo
. Commit your changes, push to your GitHub fork, and create a pull request for me. As always, ask questions if you run into any difficulties!
Due by 12pm Thursday, September 15
If you haven’t done so already, resolve any lingering merge conflicts and push those changes to your remote fork so I can merge your previous to-dos into the upstream remote.
Learn about tidyr
in R4DS chapter 12! You know the drill by now—create tidyr_notes_YOURNAME.Rmd
and tidyr_muddiest_YOURNAME.md
, knitting the former to a github_document
.
Submission: Put your files in the todo5/
directory of the Class-Exercise-Repo
. Commit your changes, push to your GitHub fork, check that the md files are formatted like you expect, and create a pull request for me.
Due by 12pm Tuesday, September 20
relational_notes_YOURNAME.Rmd
, relational_notes_YOURNAME.md
, relational_muddiest_YOURNAME.md
.todo6/
directory of the Class-Exercise-Repo
. After you commit, push, and create a pull request, look at the “Files changed” tab (or add /files
to the end of the URL on the page for your PR). Does it look like you expect? That is, are the changes only the ones that you expect? If so, great! If not, create new commit(s) and push to your fork; remember that any commits you push while a PR is still open will be added to the PR.Due by 12pm Thursday, September 22
Same as usual, with the vectors chapter.
In the data import chapter, please also read sections 11.1 & 11.2, skip 11.3, skim 11.4, and read 11.5. No need for notes/MPs on this chapter, just on the vectors chapter.
You know the drill from here!
Due by 12pm Tuesday, September 29
We’ll use a Wickham family tag-team to learn about iteration.
If you’ve encountered so-called for-loops in other programming languages, then you can skip to #2! If not, it’s good to be aware of what they are, even though you probably won’t use them much in R. Read the R4DS iteration chapter, but just sections 21.1 & 21.2.
Install the repurrrsive
package, which has datasets that are useful for explaining & exploring iteration. Then watch this video, which is the first half of Charlotte Wickham’s tutorial on purrr
. (Her brother, Hadley Wickham, co-wrote R4DS!) You can download the slides from the GitHub repo for the tutorial if you like. There’s a separate video that’s the second half of the tutorial, but that’s optional.
Submission: Write up your notes, a muddiest point, and at least one exercise that you design. Then the usual: commit, push to your fork, open a PR for me.
Due by 12pm Thursday, September 29
Since we read the data import chapter already, just give it a quick re-skim. Add your muddiest points to readr_muddiest_YOURNAME.md
.
Check out some projects from previous versions of this class (Spring 2022, Spring 2021). Again, these projects used Python, which we don’t cover in this class; read them not for the code but for their scope. Choose two projects and read their project_plan.md
and progress_report.md
. In a paragraph or so, link to the projects and summarize how students’ projects evolved; add it to the file proj-observations_YOURNAME.md
Submission: Class-Exercise-Repo/todo9/
Due by 12pm Thursday, October 6
Explore the stringr
package by going through the R4DS strings chapter. Give yourself time to go through this chapter, as it’s on the longer side. Feel free to consult the regular expressions learning resources!
You know the drill from here: notes & muddiest point(s) in todo10/
.
Due by 12pm Thursday, October 20
Skim today’s readings on language models and pose a muddiest point or follow-up question for each (skipping the article(s) you’re presenting, of course).
Add your MP or follow-up to Class-Exercise-Repo/data-ethics/
as (e.g.) bender_muddiest_Dan.md
.
Due by 12pm Tuesday, November 1
Skim today’s readings on accountability to communities and pose a muddiest point or follow-up question for each (skipping the article(s) you’re presenting, of course).
Add your MP or follow-up to Class-Exercise-Repo/data-ethics/
as (e.g.) bender_muddiest_Dan.md
.
Due by 12pm Tuesday, November 8
Data sharing Q&A! Once you’ve watched Thursday’s lesson video by Dr. Lauren Collister, think of at least 1–2 questions on the topics Lauren covers, and add them to this Google doc.
Submission: The Google doc is your submission—don’t forget to add your name. Lauren will answer any questions that are asked by 12pm on Monday in class; she’ll address all questions in class, but she might not be able to get you a full answer if it’s asked closer to class.
Due by 12pm Thursday, November 10
What has everyone been up to? Let’s take a look – it’s a “visit your classmates” day!
I’ve set up a directory with project guestbooks in our Class-Exercise-Repo
.
project-guestbooks
directory’s README.Once you’re ready to submit, push to your fork and create a pull request for me.
Last to-do of the semester!!
Due by 12pm Thursday, November 17
Same as to-do 14, but with different projects!
Once you’re ready to submit, push to your fork and create a pull request for me.