To-do assignments
Submission guidelines
- If the To-do includes submitting a Quarto document:
- Render the file to GitHub-flavored Markdown (
format: gfm
in the YAML header)- If errors, fix and try rendering again
- Make sure the rendered file doesn’t have super-long outputs
- To see what I mean, create a qmd that just contains
as.data.frame(ggplot2::diamonds)
- To see what I mean, create a qmd that just contains
- Submit both the source
.qmd
file and the rendered.md
file
- Render the file to GitHub-flavored Markdown (
- If the To-do includes submitting a Markdown file (whether it’s a rendered Quarto document or a standalone Markdown file):
- Commit & push the file to your fork to check formatting
- If formatting looks off, commit & push changes until it looks right (this might mean a lot of commits early on when you’re still learning, and that’s okay!)
- Commit & push the file to your fork to check formatting
- When creating a pull request, check:
- Whether GitHub can merge your changes
- The “Files changed” tab: Anything unexpected?
- If we’ve done an in-class assignment since your last pull request, then those files will be included in the PR. That’s fine as long as you’ve added your name as a suffix to the file
To-do 1
Due Aug 28 by 12 noon
- At the end of the semester, previous students in this class wrote anonymous letters to future Data Science students detailing their advice for how to succeed in the course. These letters are pinned to our Slack #general channel here. Please read these letters and identify:
- One idea that surprised you and why it surprised you
- Two concrete strategies that you will take in order to succeed in Data Science this semester
- This semester, you’ll frequently be asked to identify a muddiest point: some concept, skill, or nagging question that’s giving you the most issues or that you’re most unsure about. This could be something we’ve already gone over, or an extension of what we’ve discussed.
- Read the assigned readings from our R for data science textbook (the intro and chapter 2).
- Then identify your muddiest point about the chapters.
Good muddiest points are specific questions. Using Git as an example:
- “Why does Git make us stage files before we commit them?”
- “I keep getting the message
no changes added to commit (use "git add" and/or "git commit -a")
, what am I doing rong?” - “What is
.gitignore
?” - “Do you [Dan] actually use Git in your projects?”
Not-so-good muddiest points are noun phrases or vague questions:
- “reverting files”
- “How do I use Git?”
- The next step on our Git journey is GitHub
- Create a GitHub account at https://github.com/, if you don’t already have one. Pick a good username because changing your GitHub username creates annoying problems.
- Add your Pitt email address to your account.
- Check your inbox and spam folder for a verification email.
- Add an avatar to your GitHub account (usually a headshot, but can be anything you like).
- In your submission, send me the link to your GitHub user page (i.e.,
https://github.com/YOUR-USERNAME
). You don’t have to add any other personalization, just send me the link.
Submission
Write up your answers in a text file (should have the .txt
extension) or a Markdown file (.md
) if you’re comfortable with Markdown. Name the file todo1_YOURNAME.txt
or todo1_YOURNAME.md
(replace YOURNAME with your actual name). Share it to the #to-dos channel on our Slack workspace.
To-do 2
Due Sep 2 by 12 noon
Time for some hands-on practice! Do the following:
-
Learn about
ggplot2
! Go through the Data visualization and Workflow: scripts and projects chapters in our class version of R for data science. (Pay attention to the yellow blocks, where I’ve injected notes for our class into the chapter!)It’s up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doing—not just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.
-
Future to-dos will ask you to create your own study notes for R4DS chapters. For now, your task is to evaluate some past students’
ggplot2
notes, which you can find at https://github.com/Data-Sci-2025/Class-Exercise-Repo/tree/main/todo2/old-notes. Compare and contrast: If you were future-you, how helpful would you find these notes? Are there some notes you’d find more helpful than others? Pay attention not only to content but also style, formatting, and organization. There’s not a right or wrong answer here—different things work for different people! Write up your observations innotes-on-notes_YOURNAME.txt
(or.md
).These notes were for the previous edition of R4DS (https://r4ds.had.co.nz/data-visualisation.html), hence any differences in section numbers, etc.
-
Think about what Git can do, and think back to the ways you’ve managed files in the past. Even if you didn’t realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?
Example answer
When I’m writing a paper, I’ll add aversionX
suffix likeversion0.5
,version0.6
, etc. Once I feel like I’ve made decent progress, I’ll increment the version number and add a little comment to the top of the file describing the changes since the previous version. Then I don’t touch the old file, and I only work on the latest version. Each new version is basically like a Git commit, and the little comment is like a commit message. In my current system, the little comments aren’t easy to glance over; if I’m looking to undo a previous change, I have to re-open each old version to read through the commits. Git can do that more easily by letting me see commits and diffs at a glance. Plus, in my current system there’s no guarantee that I won’t accidentally change an old version rather than the latest version, which would render the “commit message” useless. Git prevents that from happening because once changes are committed, they’re there.
Write up your response as git_notes_YOURNAME.txt
(or .md
)
- If you haven’t set a personal access token for GitHub, do so now:
- In the R console, run
usethis::create_github_token()
- This will load a GitHub page for you to generate a PAT
- In “Note”, write “My laptop” or something that describes your local machine
- In “Expiration” set “No expiration” (even though GitHub doesn’t recommend it)
- Leave everything else as-is
- Click “Generate token” and copy your token to the clipboard
- Back in RStudio, run
gitcreds::gitcreds_set()
without any arguments - Follow the prompts and paste your token when asked
- If needed, change your Git config user name to match your GitHub user name:
usethis::use_git_config(user.name = "YOUR GITHUB USERNAME")
- In the R console, run
Submission
Share your notes on the #to-dos channel on Slack
To-do 3
Due Sep 4 by 12 noon
-
Now that you’ve reviewed past students’ study notes, it’s your turn! Read up on Quarto, a literate programming framework for data science, in the Quarto chapter of R for data science. Create some notes that’ll be helpful for future-you in a Quarto document called
quarto_notes_YOURNAME.qmd
. -
What’s your muddiest point for
ggplot2
? Create a file calledggplot2_muddiest_YOURNAME.txt
that has your muddiest point: After going through the Data visualization chapter, what’s the concept or skill that’s giving you the most issues? What are you most unsure about? -
Attempt to pull
Class-Exercise-Repo
from upstream. This will add a new directory to your local repo:todo3/
. If you get an error message, ask me about it in Slack #q-and-a -
On our class GitHub organization, set your visibility to public
- Go to https://github.com/orgs/Data-Sci-2025/people
- To the right of your name, there’s a drop-down menu with “Private”. Select “Public” instead.
Submission
From here on out, to-do submissions will take the form of GitHub pull requests. You should have 2 new files: Quarto notes and ggplot2 muddiest point. Add your files to todo3/
, stage, commit, and push to origin
. Start a pull request, and remember the “PR checklist”: check whether GitHub can merge your changes, and check the changed files. If it looks good, open the pull request.
To-do 4
Due Sep 9 by 12 noon
It’s time to flex our newfound skills with Markdown and Quarto!
-
Learn about
dplyr
! through the Data transformation chapter in our class version of R for data science, and create your own study notes astodo4/dplyr_notes_YOURNAME.qmd
. Then render to a GitHub-flavored Markdown filetodo4/dplyr_notes_YOURNAME.md
.A string like “
todo4/dplyr_notes_YOURNAME.qmd
” is a shorthand way of referring to a path: both the file (dplyr_notes_YOURNAME.qmd
) and the directory that it sits in (todo4/
). You’ll sometimes see paths with a file multiple directories deep (e.g.,~/Documents/Research/research-plan.md
). You should not includetodo4/
in the file name. - Create a muddiest point for
dplyr
astodo4/dplyr_muddiest_YOURNAME.md
.- Note: Just a regular Markdown document, not a Quarto
.qmd
.
- Note: Just a regular Markdown document, not a Quarto
- Commit and push your changes to your fork (
git push
in the RStudio Terminal, not the “Push” button), but don’t create a PR yet! Visit your fork on GitHub and inspect the Markdown files, just for formatting: Do they look like you expect (e.g., are there any stray formatting marks, does the R output look right, etc.)? Write a few notes about what seems to be working or not working, including a muddiest point, astodo4/markdown_notes_YOURNAME.md
; commit and push to your fork.
Submission
You should have 4 new files in the todo4/
directory on your fork: dplyr
notes as qmd, notes as md, dplyr
muddiest point, Markdown notes. Create a pull request for me. As always, ask questions if you run into any difficulties!
To-do 5
Due Sep 11 by 12 noon
-
Learn about tidy data and pivoting in the Data tidying chapter! Create
tidy-data_notes_YOURNAME.qmd
, rendering it togfm
. -
Create a muddiest point:
tidy-data_muddiest_YOURNAME.md
Submission
The usual, in the todo5/
directory of the Class-Exercise-Repo
.