To-do assignments
On this page
Submission guidelines
- If the To-do includes submitting a Quarto document:
- Render the file to GitHub-flavored Markdown (
format: gfm
in the YAML header)- If errors, fix and try rendering again
- Make sure the rendered file doesnât have super-long outputs
- To see what I mean, create a qmd that just contains
as.data.frame(ggplot2::diamonds)
- To see what I mean, create a qmd that just contains
- Submit both the source
.qmd
file and the rendered.md
file
- Render the file to GitHub-flavored Markdown (
- If the To-do includes submitting a Markdown file (whether itâs a rendered Quarto document or a standalone Markdown file):
- Commit & push the file to your fork to check formatting
- If formatting looks off, commit & push changes until it looks right (this might mean a lot of commits early on when youâre still learning, and thatâs okay!)
- Commit & push the file to your fork to check formatting
- When creating a pull request, check:
- Whether GitHub can merge your changes
- The âFiles changedâ tab: Anything unexpected?
- If weâve done an in-class assignment since your last pull request, then those files will be included in the PR. Thatâs fine as long as youâve added your name as a suffix to the file
To-do 1
Due Aug 28 by 12 noon
- At the end of the semester, previous students in this class wrote anonymous letters to future Data Science students detailing their advice for how to succeed in the course. These letters are pinned to our Slack #general channel here. Please read these letters and identify:
- One idea that surprised you and why it surprised you
- Two concrete strategies that you will take in order to succeed in Data Science this semester
- This semester, youâll frequently be asked to identify a muddiest point: some concept, skill, or nagging question thatâs giving you the most issues or that youâre most unsure about. This could be something weâve already gone over, or an extension of what weâve discussed.
- Read the assigned readings from our R for data science textbook (the intro and chapter 2).
- Then identify your muddiest point about the chapters.
Good muddiest points are specific questions. Using Git as an example:
- âWhy does Git make us stage files before we commit them?â
- âI keep getting the message
no changes added to commit (use "git add" and/or "git commit -a")
, what am I doing rong?â - âWhat is
.gitignore
?â - âDo you [Dan] actually use Git in your projects?â
Not-so-good muddiest points are noun phrases or vague questions:
- âreverting filesâ
- âHow do I use Git?â
- The next step on our Git journey is GitHub
- Create a GitHub account at https://github.com/, if you donât already have one. Pick a good username because changing your GitHub username creates annoying problems.
- Add your Pitt email address to your account.
- Check your inbox and spam folder for a verification email.
- Add an avatar to your GitHub account (usually a headshot, but can be anything you like).
- In your submission, send me the link to your GitHub user page (i.e.,
https://github.com/YOUR-USERNAME
). You donât have to add any other personalization, just send me the link.
Submission
Write up your answers in a text file (should have the .txt
extension) or a Markdown file (.md
) if youâre comfortable with Markdown. Name the file todo1_YOURNAME.txt
or todo1_YOURNAME.md
(replace YOURNAME with your actual name). Share it to the #to-dos channel on our Slack workspace.
To-do 2
Due Sep 2 by 12 noon
Time for some hands-on practice! Do the following:
-
Learn about
ggplot2
! Go through the Data visualization and Workflow: scripts and projects chapters in our class version of R for data science. (Pay attention to the yellow blocks, where Iâve injected notes for our class into the chapter!)Itâs up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doingânot just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.
-
Future to-dos will ask you to create your own study notes for R4DS chapters. For now, your task is to evaluate some past studentsâ
ggplot2
notes, which you can find at https://github.com/Data-Sci-2025/Class-Exercise-Repo/tree/main/todo2/old-notes. Compare and contrast: If you were future-you, how helpful would you find these notes? Are there some notes youâd find more helpful than others? Pay attention not only to content but also style, formatting, and organization. Thereâs not a right or wrong answer hereâdifferent things work for different people! Write up your observations innotes-on-notes_YOURNAME.txt
(or.md
).These notes were for the previous edition of R4DS (https://r4ds.had.co.nz/data-visualisation.html), hence any differences in section numbers, etc.
-
Think about what Git can do, and think back to the ways youâve managed files in the past. Even if you didnât realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?
Example answer
When Iâm writing a paper, Iâll add aversionX
suffix likeversion0.5
,version0.6
, etc. Once I feel like Iâve made decent progress, Iâll increment the version number and add a little comment to the top of the file describing the changes since the previous version. Then I donât touch the old file, and I only work on the latest version. Each new version is basically like a Git commit, and the little comment is like a commit message. In my current system, the little comments arenât easy to glance over; if Iâm looking to undo a previous change, I have to re-open each old version to read through the commits. Git can do that more easily by letting me see commits and diffs at a glance. Plus, in my current system thereâs no guarantee that I wonât accidentally change an old version rather than the latest version, which would render the âcommit messageâ useless. Git prevents that from happening because once changes are committed, theyâre there.
Write up your response as git_notes_YOURNAME.txt
(or .md
)
- If you havenât set a personal access token for GitHub, do so now:
- In the R console, run
usethis::create_github_token()
- This will load a GitHub page for you to generate a PAT
- In âNoteâ, write âMy laptopâ or something that describes your local machine
- In âExpirationâ set âNo expirationâ (even though GitHub doesnât recommend it)
- Leave everything else as-is
- Click âGenerate tokenâ and copy your token to the clipboard
- Back in RStudio, run
gitcreds::gitcreds_set()
without any arguments - Follow the prompts and paste your token when asked
- If needed, change your Git config user name to match your GitHub user name:
usethis::use_git_config(user.name = "YOUR GITHUB USERNAME")
- In the R console, run
Submission
Share your notes on the #to-dos channel on Slack
To-do 3
Due Sep 4 by 12 noon
-
Now that youâve reviewed past studentsâ study notes, itâs your turn! Read up on Quarto, a literate programming framework for data science, in the Quarto chapter of R for data science. Create some notes thatâll be helpful for future-you in a Quarto document called
quarto_notes_YOURNAME.qmd
. -
Whatâs your muddiest point for
ggplot2
? Create a file calledggplot2_muddiest_YOURNAME.txt
that has your muddiest point: After going through the Data visualization chapter, whatâs the concept or skill thatâs giving you the most issues? What are you most unsure about? -
Attempt to pull
Class-Exercise-Repo
from upstream. This will add a new directory to your local repo:todo3/
. If you get an error message, ask me about it in Slack #q-and-a -
On our class GitHub organization, set your visibility to public
- Go to https://github.com/orgs/Data-Sci-2025/people
- To the right of your name, thereâs a drop-down menu with âPrivateâ. Select âPublicâ instead.
Submission
From here on out, to-do submissions will take the form of GitHub pull requests. You should have 2 new files: Quarto notes and ggplot2 muddiest point. Add your files to todo3/
, stage, commit, and push to origin
. Start a pull request, and remember the âPR checklistâ: check whether GitHub can merge your changes, and check the changed files. If it looks good, open the pull request.
To-do 4
Due Sep 9 by 12 noon
Itâs time to flex our newfound skills with Markdown and Quarto!
-
Learn about
dplyr
! through the Data transformation chapter in our class version of R for data science, and create your own study notes astodo4/dplyr_notes_YOURNAME.qmd
. Then render to a GitHub-flavored Markdown filetodo4/dplyr_notes_YOURNAME.md
.A string like â
todo4/dplyr_notes_YOURNAME.qmd
â is a shorthand way of referring to a path: both the file (dplyr_notes_YOURNAME.qmd
) and the directory that it sits in (todo4/
). Youâll sometimes see paths with a file multiple directories deep (e.g.,~/Documents/Research/research-plan.md
). You should not includetodo4/
in the file name. - Create a muddiest point for
dplyr
astodo4/dplyr_muddiest_YOURNAME.md
.- Note: Just a regular Markdown document, not a Quarto
.qmd
.
- Note: Just a regular Markdown document, not a Quarto
- Commit and push your changes to your fork (
git push
in the RStudio Terminal, not the âPushâ button), but donât create a PR yet! Visit your fork on GitHub and inspect the Markdown files, just for formatting: Do they look like you expect (e.g., are there any stray formatting marks, does the R output look right, etc.)? Write a few notes about what seems to be working or not working, including a muddiest point, astodo4/markdown_notes_YOURNAME.md
; commit and push to your fork.
Submission
You should have 4 new files in the todo4/
directory on your fork: dplyr
notes as qmd, notes as md, dplyr
muddiest point, Markdown notes. Create a pull request for me. As always, ask questions if you run into any difficulties!
To-do 5
Due Sep 11 by 12 noon
-
Learn about tidy data and pivoting in the Data tidying chapter! Create
tidy-data_notes_YOURNAME.qmd
, rendering it togfm
. -
Create a muddiest point:
tidy-data_muddiest_YOURNAME.md
Submission
The usual, in the todo5/
directory of the Class-Exercise-Repo
.
To-do 6
Due Sep 16 by 12 noon
-
Learn about relational data in the Joins chapter! You know the drill by now: Create
relational_notes_YOURNAME.qmd
, rendered togfm
; plus a muddiest point asrelational_muddiest_YOURNAME.md
. -
Your final projects will be published as public repositories, just like previous versions of this class. (Iâll release more info about the project next week.) Choose two projects from the list on the todo6
README.md
to review. Read theirproject_plan.md
,progress_report.md
, andfinal_report.md
, then create a one-paragraph summary (for each) of how the project evolved. Write up your observations in the fileproj-observations_YOURNAME.md
. -
Play around with some more keyboard shortcuts in RStudio. Describe what you learned in a brief notes file:
keyboard-shortcuts_YOURNAME.md
.
Submission
The usual, in the todo6/
directory of the Class-Exercise-Repo
.
To-do 7
Due Sep 18 by 12 noon
-
Learn about
readr
in the Data import chapter. You know the drill by nowâyou should have three files for this chapter:readr_notes_YOURNAME.qmd
,readr_notes_YOURNAME.md
,readr_muddiest_YOURNAME.md
. -
Read through your earlier muddiest points. Pick one that you feel like you understand better now (the key word being betterâitâs okay if youâre not all the way there yet!). In
old_muddiest_YOURNAME.md
, discuss (1) what you now understand that you didnât before, and (2) how you got there.
Submission
The usual, in the todo7/
directory
To-do 8
Due Sep 25 by 12 noon
-
Learn about hierarchical data in the Hierarchical data chapter. You know the drill by nowâyou should have three files for this chapter:
hierarchical-data_notes_YOURNAME.qmd
,hierarchical-data_notes_YOURNAME.md
,hierarchical-data_muddiest_YOURNAME.md
. -
Do the same with the Iteration chapter.
-
Revisit the âletters to future studentsâ from To-do 1, now available in our
Class-Exercise-Repo
here. Whatâs one piece of project-related advice that you found helpful, and why? Put it inproject-advice_YOURNAME.md
.
Submission
The usual, in the todo8/
directory
To-do 9
Due Sep 30 by 12 noon
- Learn about strings in the Strings chapter. You know the drill by nowâyou should have three files for this chapter:
strings_notes_YOURNAME.qmd
,strings_notes_YOURNAME.md
,strings_muddiest_YOURNAME.md
.
Submission
The usual, in the todostrings/
directory
To-do 10
Due Oct 2 by 12 noon
- Learn about regular expressions in the Regular expressions chapter. You know the drill by nowâyou should have three files for this chapter:
regex_notes_YOURNAME.qmd
,regex_notes_YOURNAME.md
,regex_muddiest_YOURNAME.md
.
Submission
The usual, in the todo10/
directory
To-do 11
Due Oct 9 by 12 noon
- Read Villarreal 2024 on fairness in sociolinguistic auto-coding. If you want some background on sociolinguistic auto-coding, check out Kendall et al. 2021. Come up with 1-2 discussion questions.
Submission
Weâre going to try something a little different this time: collaborative editing!
- As usual, pull changes from upstream to your local repo.
- This time, instead of creating a new file, edit the existing
todo11/discussion-questions.md
file: add your question(s) under your name. - Commit, push to your fork, and create a pull request.
If weâve all done this correctly, I should be able to merge all your commits together without any conflicts.
To-do 12
Due Oct 14 by 12 noon
- Read Bender et al. 2021 on large language models and come up with 1â2 discussion questions.
- Note: This paper eventually led to Google firing two of its authors, Timnit Gebru and Margaret Mitchell; Timnit Gebru went on to launch the Distributed Artificial Intelligence Research Institute (DAIR). These two magazine articles (which are paywalled, unfortunately), describe the full story: Hao 2021, Perrigo 2022.
Submission
Like with todo11
, weâre going to do collaborative editing.
- As usual, pull changes from upstream to your local repo.
- This time, instead of creating a new file, edit the existing
todo12/discussion-questions.md
file: add your question(s) under your name. - Commit, push to your fork, and create a pull request.
If weâve all done this correctly, I should be able to merge all your commits together without any conflicts.
To-do 13
Due Oct 16 by 12 noon
Time for a midterm check-in!
- Write up a short reflection on the following questions (2â3 sentences apiece):
- How do you think the midterm is going for you overall?
- What have you had the most trouble with?
- Where are you turning to for help (your notes, the textbook, class recordings, the internet, etc.), and with what?
- (If applicable) Any other questions for me?
- Commit whatever youâve got right now and push it to your fork
- This is just so I can keep track of whatâs been done before vs. after our discussion in class on Thursday
Submit your reflection on the Class Exercise Repo as todo13/midterm_reflection_YOURNAME.md