To-do assignments
On this page
Submission guidelines
- If the To-do includes submitting a Quarto document:
- Render the file to GitHub-flavored Markdown (
format: gfmin the YAML header)- If errors, fix and try rendering again
- Make sure the rendered file doesnât have super-long outputs
- To see what I mean, create a qmd that just contains
as.data.frame(ggplot2::diamonds)
- To see what I mean, create a qmd that just contains
- Submit both the source
.qmdfile and the rendered.mdfile
- Render the file to GitHub-flavored Markdown (
- If the To-do includes submitting a Markdown file (whether itâs a rendered Quarto document or a standalone Markdown file):
- Commit & push the file to your fork to check formatting
- If formatting looks off, commit & push changes until it looks right (this might mean a lot of commits early on when youâre still learning, and thatâs okay!)
- Commit & push the file to your fork to check formatting
- When creating a pull request, check:
- Whether GitHub can merge your changes
- The âFiles changedâ tab: Anything unexpected?
- If weâve done an in-class assignment since your last pull request, then those files will be included in the PR. Thatâs fine as long as youâve added your name as a suffix to the file
To-do 1
Due Aug 28 by 12 noon
- At the end of the semester, previous students in this class wrote anonymous letters to future Data Science students detailing their advice for how to succeed in the course. These letters are pinned to our Slack #general channel here. Please read these letters and identify:
- One idea that surprised you and why it surprised you
- Two concrete strategies that you will take in order to succeed in Data Science this semester
- This semester, youâll frequently be asked to identify a muddiest point: some concept, skill, or nagging question thatâs giving you the most issues or that youâre most unsure about. This could be something weâve already gone over, or an extension of what weâve discussed.
- Read the assigned readings from our R for data science textbook (the intro and chapter 2).
- Then identify your muddiest point about the chapters.
Good muddiest points are specific questions. Using Git as an example:
- âWhy does Git make us stage files before we commit them?â
- âI keep getting the message
no changes added to commit (use "git add" and/or "git commit -a"), what am I doing rong?â - âWhat is
.gitignore?â - âDo you [Dan] actually use Git in your projects?â
Not-so-good muddiest points are noun phrases or vague questions:
- âreverting filesâ
- âHow do I use Git?â
- The next step on our Git journey is GitHub
- Create a GitHub account at https://github.com/, if you donât already have one. Pick a good username because changing your GitHub username creates annoying problems.
- Add your Pitt email address to your account.
- Check your inbox and spam folder for a verification email.
- Add an avatar to your GitHub account (usually a headshot, but can be anything you like).
- In your submission, send me the link to your GitHub user page (i.e.,
https://github.com/YOUR-USERNAME). You donât have to add any other personalization, just send me the link.
Submission
Write up your answers in a text file (should have the .txt extension) or a Markdown file (.md) if youâre comfortable with Markdown. Name the file todo1_YOURNAME.txt or todo1_YOURNAME.md (replace YOURNAME with your actual name). Share it to the #to-dos channel on our Slack workspace.
To-do 2
Due Sep 2 by 12 noon
Time for some hands-on practice! Do the following:
-
Learn about
ggplot2! Go through the Data visualization and Workflow: scripts and projects chapters in our class version of R for data science. (Pay attention to the yellow blocks, where Iâve injected notes for our class into the chapter!)Itâs up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doingânot just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.
-
Future to-dos will ask you to create your own study notes for R4DS chapters. For now, your task is to evaluate some past studentsâ
ggplot2notes, which you can find at https://github.com/Data-Sci-2025/Class-Exercise-Repo/tree/main/todo2/old-notes. Compare and contrast: If you were future-you, how helpful would you find these notes? Are there some notes youâd find more helpful than others? Pay attention not only to content but also style, formatting, and organization. Thereâs not a right or wrong answer hereâdifferent things work for different people! Write up your observations innotes-on-notes_YOURNAME.txt(or.md).These notes were for the previous edition of R4DS (https://r4ds.had.co.nz/data-visualisation.html), hence any differences in section numbers, etc.
-
Think about what Git can do, and think back to the ways youâve managed files in the past. Even if you didnât realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?
Example answer
When Iâm writing a paper, Iâll add aversionXsuffix likeversion0.5,version0.6, etc. Once I feel like Iâve made decent progress, Iâll increment the version number and add a little comment to the top of the file describing the changes since the previous version. Then I donât touch the old file, and I only work on the latest version. Each new version is basically like a Git commit, and the little comment is like a commit message. In my current system, the little comments arenât easy to glance over; if Iâm looking to undo a previous change, I have to re-open each old version to read through the commits. Git can do that more easily by letting me see commits and diffs at a glance. Plus, in my current system thereâs no guarantee that I wonât accidentally change an old version rather than the latest version, which would render the âcommit messageâ useless. Git prevents that from happening because once changes are committed, theyâre there.
Write up your response as git_notes_YOURNAME.txt (or .md)
- If you havenât set a personal access token for GitHub, do so now:
- In the R console, run
usethis::create_github_token() - This will load a GitHub page for you to generate a PAT
- In âNoteâ, write âMy laptopâ or something that describes your local machine
- In âExpirationâ set âNo expirationâ (even though GitHub doesnât recommend it)
- Leave everything else as-is
- Click âGenerate tokenâ and copy your token to the clipboard
- Back in RStudio, run
gitcreds::gitcreds_set()without any arguments - Follow the prompts and paste your token when asked
- If needed, change your Git config user name to match your GitHub user name:
usethis::use_git_config(user.name = "YOUR GITHUB USERNAME")
- In the R console, run
Submission
Share your notes on the #to-dos channel on Slack
To-do 3
Due Sep 4 by 12 noon
-
Now that youâve reviewed past studentsâ study notes, itâs your turn! Read up on Quarto, a literate programming framework for data science, in the Quarto chapter of R for data science. Create some notes thatâll be helpful for future-you in a Quarto document called
quarto_notes_YOURNAME.qmd. -
Whatâs your muddiest point for
ggplot2? Create a file calledggplot2_muddiest_YOURNAME.txtthat has your muddiest point: After going through the Data visualization chapter, whatâs the concept or skill thatâs giving you the most issues? What are you most unsure about? -
Attempt to pull
Class-Exercise-Repofrom upstream. This will add a new directory to your local repo:todo3/. If you get an error message, ask me about it in Slack #q-and-a -
On our class GitHub organization, set your visibility to public
- Go to https://github.com/orgs/Data-Sci-2025/people
- To the right of your name, thereâs a drop-down menu with âPrivateâ. Select âPublicâ instead.
Submission
From here on out, to-do submissions will take the form of GitHub pull requests. You should have 2 new files: Quarto notes and ggplot2 muddiest point. Add your files to todo3/, stage, commit, and push to origin. Start a pull request, and remember the âPR checklistâ: check whether GitHub can merge your changes, and check the changed files. If it looks good, open the pull request.
To-do 4
Due Sep 9 by 12 noon
Itâs time to flex our newfound skills with Markdown and Quarto!
-
Learn about
dplyr! through the Data transformation chapter in our class version of R for data science, and create your own study notes astodo4/dplyr_notes_YOURNAME.qmd. Then render to a GitHub-flavored Markdown filetodo4/dplyr_notes_YOURNAME.md.A string like â
todo4/dplyr_notes_YOURNAME.qmdâ is a shorthand way of referring to a path: both the file (dplyr_notes_YOURNAME.qmd) and the directory that it sits in (todo4/). Youâll sometimes see paths with a file multiple directories deep (e.g.,~/Documents/Research/research-plan.md). You should not includetodo4/in the file name. - Create a muddiest point for
dplyrastodo4/dplyr_muddiest_YOURNAME.md.- Note: Just a regular Markdown document, not a Quarto
.qmd.
- Note: Just a regular Markdown document, not a Quarto
- Commit and push your changes to your fork (
git pushin the RStudio Terminal, not the âPushâ button), but donât create a PR yet! Visit your fork on GitHub and inspect the Markdown files, just for formatting: Do they look like you expect (e.g., are there any stray formatting marks, does the R output look right, etc.)? Write a few notes about what seems to be working or not working, including a muddiest point, astodo4/markdown_notes_YOURNAME.md; commit and push to your fork.
Submission
You should have 4 new files in the todo4/ directory on your fork: dplyr notes as qmd, notes as md, dplyr muddiest point, Markdown notes. Create a pull request for me. As always, ask questions if you run into any difficulties!
To-do 5
Due Sep 11 by 12 noon
-
Learn about tidy data and pivoting in the Data tidying chapter! Create
tidy-data_notes_YOURNAME.qmd, rendering it togfm. -
Create a muddiest point:
tidy-data_muddiest_YOURNAME.md
Submission
The usual, in the todo5/ directory of the Class-Exercise-Repo.
To-do 6
Due Sep 16 by 12 noon
-
Learn about relational data in the Joins chapter! You know the drill by now: Create
relational_notes_YOURNAME.qmd, rendered togfm; plus a muddiest point asrelational_muddiest_YOURNAME.md. -
Your final projects will be published as public repositories, just like previous versions of this class. (Iâll release more info about the project next week.) Choose two projects from the list on the todo6
README.mdto review. Read theirproject_plan.md,progress_report.md, andfinal_report.md, then create a one-paragraph summary (for each) of how the project evolved. Write up your observations in the fileproj-observations_YOURNAME.md. -
Play around with some more keyboard shortcuts in RStudio. Describe what you learned in a brief notes file:
keyboard-shortcuts_YOURNAME.md.
Submission
The usual, in the todo6/ directory of the Class-Exercise-Repo.
To-do 7
Due Sep 18 by 12 noon
-
Learn about
readrin the Data import chapter. You know the drill by nowâyou should have three files for this chapter:readr_notes_YOURNAME.qmd,readr_notes_YOURNAME.md,readr_muddiest_YOURNAME.md. -
Read through your earlier muddiest points. Pick one that you feel like you understand better now (the key word being betterâitâs okay if youâre not all the way there yet!). In
old_muddiest_YOURNAME.md, discuss (1) what you now understand that you didnât before, and (2) how you got there.
Submission
The usual, in the todo7/ directory
To-do 8
Due Sep 25 by 12 noon
-
Learn about hierarchical data in the Hierarchical data chapter. You know the drill by nowâyou should have three files for this chapter:
hierarchical-data_notes_YOURNAME.qmd,hierarchical-data_notes_YOURNAME.md,hierarchical-data_muddiest_YOURNAME.md. -
Do the same with the Iteration chapter.
-
Revisit the âletters to future studentsâ from To-do 1, now available in our
Class-Exercise-Repohere. Whatâs one piece of project-related advice that you found helpful, and why? Put it inproject-advice_YOURNAME.md.
Submission
The usual, in the todo8/ directory
To-do 9
Due Sep 30 by 12 noon
- Learn about strings in the Strings chapter. You know the drill by nowâyou should have three files for this chapter:
strings_notes_YOURNAME.qmd,strings_notes_YOURNAME.md,strings_muddiest_YOURNAME.md.
Submission
The usual, in the todostrings/ directory
To-do 10
Due Oct 2 by 12 noon
- Learn about regular expressions in the Regular expressions chapter. You know the drill by nowâyou should have three files for this chapter:
regex_notes_YOURNAME.qmd,regex_notes_YOURNAME.md,regex_muddiest_YOURNAME.md.
Submission
The usual, in the todo10/ directory
To-do 11
Due Oct 9 by 12 noon
- Read Villarreal 2024 on fairness in sociolinguistic auto-coding. If you want some background on sociolinguistic auto-coding, check out Kendall et al. 2021. Come up with 1-2 discussion questions.
Submission
Weâre going to try something a little different this time: collaborative editing!
- As usual, pull changes from upstream to your local repo.
- This time, instead of creating a new file, edit the existing
todo11/discussion-questions.mdfile: add your question(s) under your name. - Commit, push to your fork, and create a pull request.
If weâve all done this correctly, I should be able to merge all your commits together without any conflicts.
To-do 12
Due Oct 14 by 12 noon
- Read Bender et al. 2021 on large language models and come up with 1â2 discussion questions.
- Note: This paper eventually led to Google firing two of its authors, Timnit Gebru and Margaret Mitchell; Timnit Gebru went on to launch the Distributed Artificial Intelligence Research Institute (DAIR). These two magazine articles (which are paywalled, unfortunately), describe the full story: Hao 2021, Perrigo 2022.
Submission
Like with todo11, weâre going to do collaborative editing.
- As usual, pull changes from upstream to your local repo.
- This time, instead of creating a new file, edit the existing
todo12/discussion-questions.mdfile: add your question(s) under your name. - Commit, push to your fork, and create a pull request.
If weâve all done this correctly, I should be able to merge all your commits together without any conflicts.
To-do 13
Due Oct 16 by 12 noon
Time for a midterm check-in!
- Write up a short reflection on the following questions (2â3 sentences apiece):
- How do you think the midterm is going for you overall?
- What have you had the most trouble with?
- Where are you turning to for help (your notes, the textbook, class recordings, the internet, etc.), and with what?
- (If applicable) Any other questions for me?
- Commit whatever youâve got right now and push it to your fork
- This is just so I can keep track of whatâs been done before vs. after our discussion in class on Thursday
Submit your reflection on the Class Exercise Repo as todo13/midterm_reflection_YOURNAME.md
To-do 14
Due Oct 23 by 12 noon
- Learn about the tidy text format in chapters 1 and 3 of [Text Mining with R][TMwR]. Include the usual files (for both chapters combined, not per-chapter):
tidytext_notes_YOURNAME.qmd,tidytext_notes_YOURNAME.md,tidytext_muddiest_YOURNAME.md. - One additional file: a brief note on
tidytextwill be useful for your final project (or not, if itâs not applicable). Write this up astidytext_project_YOURNAME.md.
Submission
The usual, in the todo14/ directory
To-do 15
Due Oct 30 by 12 noon
- Letâs have a nice meal at the dinerâthe CSS Diner! This is a fun website for learning/practicing CSS selectors, a necessary skill for web-scraping. Set your timer for 1 hour and get as far as you can. (This isnât for the sake of pressure or competition, just time management. If your project involves web-scraping, go further and let me know how far you got in the initial hour.) Write up a brief reflection, including a muddiest point, as
css_reflection_YOURNAME.md.
Submission
The usual, in the todo15/ directory
To-do 16
Due Nov 4 by 12 noon
What has everyone been up to? Letâs take a look â itâs a âvisit your classmatesâ to-do!
Iâve set up a directory with project guestbooks in our Class-Exercise-Repo.
- Visit your classmatesâ projects! The order of who visits who is in
project-guestbooks/README.md. - Take a look around, and write in their guestbook (not your own!). Your entry should consist of (at least):
- One thing you thought was done well
- One suggestion or avenue for improvement
- One thing you learned
- Clone their repo to your computer and see if you can reproduce their data pipeline as it currently stands. Ideally, it should be clear enough from the directions in their repo how to run their pipeline. In some cases it might be impossible because of private data; Iâll leave it to you and your âvisiteeâ to decide whether theyâre comfortable sharing data. Include a section in your guestbook entry about:
- How easy or hard it was to run their data pipeline (or if you were unable to), and why
- If you were able to run their pipeline, did you get the same end result? If not, donât worry about this part.
Once youâre ready to submit, push to your fork and create a pull request for me.
To-do 17
Due Nov 6 by 12 noon
Right about now is a good time to re-anchor ourselves in the wisdom of those who came before us.
- Revisit the letters to future Data Science students from the beginning of the semester and identify:
- 1â2 pieces of advice youâve done a good job following so far.
- 1â2 pieces of advice you plan to focus on some more in this final push.
- Anything else youâd like to comment on.
Write this up as
letters_YOURNAME.md.
Submission
The usual, in the todo17/ directory.
To-do 18
Due Nov 13 by 12 noon
Itâs another âvisit your classmatesâ day! This is like to-do 16, but with 1 new thing:
- Visit your classmatesâ projects! The order of who visits who is in
project-guestbooks/README.md. - Take a look around, and write in their guestbook (not your own!). Your entry should consist of (at least):
- One thing you thought was done well
- One general suggestion or avenue for improvement
- (New) One specific recommendation for how you would modify their code (e.g., âI would use
map()instead oflapply()in the analysis of word frequencyâ) - One thing you learned
- (New) What you would do next if this was your project
- Clone your classmateâs repo and see if you can reproduce their data pipeline
To-do 19
Due Nov 18 by 12 noon
Weâll continue with project workdays for the next two class meetings. Write up a brief memo, troubleshooting_YOURNAME.md, with 1â2 troubleshooting questions. Put it in the todo19/ folder of our class exercise repo. (If youâre in the habit of completing to-dos early, you might want to wait on this one until shortly before class so you can get help with your latest challenge!)
To-do 20
Due Dec 12 by 6pm
This to-do is completely optional. But I recommend doing it anywayâreflection and âpaying it forwardâ both tend to be therapeutic.
Itâs been a long semester. Youâve learned tons, struggled over tough coding challenges, and attained some satisfying wins. Now is a great time to step back and reflect on where youâve come and how youâve gotten there.
Reflection
This part of the assignment is for your own reference onlyâcomplete it, but do not turn it in.
- First, think back over your engagement in our class this past semester. What strategies did you use to internalize the content? How did you engage with the textbook and create your chapter notes? How did you collaborate with other students on assignments, if at all? Did you ask questions in office hours and/or in class? What were your strategies for getting help?
- Next, think about your performance in our class this past semester. How did your strategies for engaging the content, both in and out of class, affect your performance on assignments? Did you experiment with multiple strategies, and how did they affect your performance? How well did you keep up with assignments, especially as the semester got busier?
- Finally, think about how you can apply what youâve learned in this class about yourself as a researcher to your future as a researcherâregardless of whether that research involves data science or R coding.
Action
I will be teaching Data Science again in the future. Using as guidance your reflection from the first part of the assignment, write a letter to future Data Science students detailing strategies that you found helped you to be successful in this class. These can be general strategies (âBe sure to alwaysâŚâ) or specific strategies (âIn [X unit], keep in mind thatâŚâ). Donât worry about whether or not these strategies work as well for everyone as they do for you; if you have information about different strategies that worked for others, feel free to include that. Especially useful are things you wish youâd known at the start of the semester.
Most, if not all, of these letters will be made available to future Data Science students, so please be as informative and thoughtful as possible! To protect your educational privacy, your letter will be anonymized before being shared with other students, and I would recommend you avoid referring to anything that could specifically identify you as the letter-writer (e.g., details of your final project). Please complete this assignment on your own.
Thereâs no length requirement; if you want a guideline, shoot for 300â500 words. Keep in mind that this is not the same as a course evaluationâthough please fill out OMETs if you havenât yet! The point of this letter is not to evaluate the class or my teaching, but to help future students succeed.
Submission
Write up your letter as letter_to_future_students_YOURNAME.md and DM it to me on Slack.