To-do assignments

On this page
  1. Submission guidelines
  2. To-do 1
  3. To-do 2
  4. To-do 3
  5. To-do 4
  6. To-do 5
  7. To-do 6
  8. To-do 7
  9. To-do 8
  10. To-do 9
  11. To-do 10
  12. To-do 11
  13. To-do 12
  14. To-do 13
  15. To-do 14
  16. To-do 15
  17. To-do 16
  18. To-do 17
  19. To-do 18
  20. To-do 19
  21. To-do 20

Submission guidelines

  • If the To-do includes submitting a Quarto document:
    • Render the file to GitHub-flavored Markdown (format: gfm in the YAML header)
      • If errors, fix and try rendering again
    • Make sure the rendered file doesn’t have super-long outputs
      • To see what I mean, create a qmd that just contains as.data.frame(ggplot2::diamonds)
    • Submit both the source .qmd file and the rendered .md file
  • If the To-do includes submitting a Markdown file (whether it’s a rendered Quarto document or a standalone Markdown file):
    • Commit & push the file to your fork to check formatting
      • If formatting looks off, commit & push changes until it looks right (this might mean a lot of commits early on when you’re still learning, and that’s okay!)
  • When creating a pull request, check:
    • Whether GitHub can merge your changes
    • The “Files changed” tab: Anything unexpected?
      • If we’ve done an in-class assignment since your last pull request, then those files will be included in the PR. That’s fine as long as you’ve added your name as a suffix to the file

To-do 1

Due Aug 28 by 12 noon

  1. At the end of the semester, previous students in this class wrote anonymous letters to future Data Science students detailing their advice for how to succeed in the course. These letters are pinned to our Slack #general channel here. Please read these letters and identify:
    • One idea that surprised you and why it surprised you
    • Two concrete strategies that you will take in order to succeed in Data Science this semester
  2. This semester, you’ll frequently be asked to identify a muddiest point: some concept, skill, or nagging question that’s giving you the most issues or that you’re most unsure about. This could be something we’ve already gone over, or an extension of what we’ve discussed.
    • Read the assigned readings from our R for data science textbook (the intro and chapter 2).
    • Then identify your muddiest point about the chapters.

    Good muddiest points are specific questions. Using Git as an example:

    • “Why does Git make us stage files before we commit them?”
    • “I keep getting the message no changes added to commit (use "git add" and/or "git commit -a"), what am I doing rong?”
    • “What is .gitignore?”
    • “Do you [Dan] actually use Git in your projects?”

    Not-so-good muddiest points are noun phrases or vague questions:

    • “reverting files”
    • “How do I use Git?”
  3. The next step on our Git journey is GitHub
    • Create a GitHub account at https://github.com/, if you don’t already have one. Pick a good username because changing your GitHub username creates annoying problems.
    • Add your Pitt email address to your account.
    • Check your inbox and spam folder for a verification email.
    • Add an avatar to your GitHub account (usually a headshot, but can be anything you like).
    • In your submission, send me the link to your GitHub user page (i.e., https://github.com/YOUR-USERNAME). You don’t have to add any other personalization, just send me the link.

Submission

Write up your answers in a text file (should have the .txt extension) or a Markdown file (.md) if you’re comfortable with Markdown. Name the file todo1_YOURNAME.txt or todo1_YOURNAME.md (replace YOURNAME with your actual name). Share it to the #to-dos channel on our Slack workspace.

To-do 2

Due Sep 2 by 12 noon

Time for some hands-on practice! Do the following:

  1. Learn about ggplot2! Go through the Data visualization and Workflow: scripts and projects chapters in our class version of R for data science. (Pay attention to the yellow blocks, where I’ve injected notes for our class into the chapter!)

    It’s up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doing—not just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.

  2. Future to-dos will ask you to create your own study notes for R4DS chapters. For now, your task is to evaluate some past students’ ggplot2 notes, which you can find at https://github.com/Data-Sci-2025/Class-Exercise-Repo/tree/main/todo2/old-notes. Compare and contrast: If you were future-you, how helpful would you find these notes? Are there some notes you’d find more helpful than others? Pay attention not only to content but also style, formatting, and organization. There’s not a right or wrong answer here—different things work for different people! Write up your observations in notes-on-notes_YOURNAME.txt (or .md).

    These notes were for the previous edition of R4DS (https://r4ds.had.co.nz/data-visualisation.html), hence any differences in section numbers, etc.

  3. Think about what Git can do, and think back to the ways you’ve managed files in the past. Even if you didn’t realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?

    Example answer
    When I’m writing a paper, I’ll add a versionX suffix like version0.5, version0.6, etc. Once I feel like I’ve made decent progress, I’ll increment the version number and add a little comment to the top of the file describing the changes since the previous version. Then I don’t touch the old file, and I only work on the latest version. Each new version is basically like a Git commit, and the little comment is like a commit message. In my current system, the little comments aren’t easy to glance over; if I’m looking to undo a previous change, I have to re-open each old version to read through the commits. Git can do that more easily by letting me see commits and diffs at a glance. Plus, in my current system there’s no guarantee that I won’t accidentally change an old version rather than the latest version, which would render the “commit message” useless. Git prevents that from happening because once changes are committed, they’re there.

Write up your response as git_notes_YOURNAME.txt (or .md)

  1. If you haven’t set a personal access token for GitHub, do so now:
    1. In the R console, run usethis::create_github_token()
    2. This will load a GitHub page for you to generate a PAT
      • In “Note”, write “My laptop” or something that describes your local machine
      • In “Expiration” set “No expiration” (even though GitHub doesn’t recommend it)
      • Leave everything else as-is
      • Click “Generate token” and copy your token to the clipboard
    3. Back in RStudio, run gitcreds::gitcreds_set() without any arguments
    4. Follow the prompts and paste your token when asked
    5. If needed, change your Git config user name to match your GitHub user name: usethis::use_git_config(user.name = "YOUR GITHUB USERNAME")

Submission

Share your notes on the #to-dos channel on Slack

To-do 3

Due Sep 4 by 12 noon

  1. Now that you’ve reviewed past students’ study notes, it’s your turn! Read up on Quarto, a literate programming framework for data science, in the Quarto chapter of R for data science. Create some notes that’ll be helpful for future-you in a Quarto document called quarto_notes_YOURNAME.qmd.

  2. What’s your muddiest point for ggplot2? Create a file called ggplot2_muddiest_YOURNAME.txt that has your muddiest point: After going through the Data visualization chapter, what’s the concept or skill that’s giving you the most issues? What are you most unsure about?

  3. Attempt to pull Class-Exercise-Repo from upstream. This will add a new directory to your local repo: todo3/. If you get an error message, ask me about it in Slack #q-and-a

  4. On our class GitHub organization, set your visibility to public

    • Go to https://github.com/orgs/Data-Sci-2025/people
    • To the right of your name, there’s a drop-down menu with “Private”. Select “Public” instead.

Submission

From here on out, to-do submissions will take the form of GitHub pull requests. You should have 2 new files: Quarto notes and ggplot2 muddiest point. Add your files to todo3/, stage, commit, and push to origin. Start a pull request, and remember the “PR checklist”: check whether GitHub can merge your changes, and check the changed files. If it looks good, open the pull request.

To-do 4

Due Sep 9 by 12 noon

It’s time to flex our newfound skills with Markdown and Quarto!

  1. Learn about dplyr! through the Data transformation chapter in our class version of R for data science, and create your own study notes as todo4/dplyr_notes_YOURNAME.qmd. Then render to a GitHub-flavored Markdown file todo4/dplyr_notes_YOURNAME.md.

    A string like “todo4/dplyr_notes_YOURNAME.qmd” is a shorthand way of referring to a path: both the file (dplyr_notes_YOURNAME.qmd) and the directory that it sits in (todo4/). You’ll sometimes see paths with a file multiple directories deep (e.g., ~/Documents/Research/research-plan.md). You should not include todo4/ in the file name.

  2. Create a muddiest point for dplyr as todo4/dplyr_muddiest_YOURNAME.md.
    • Note: Just a regular Markdown document, not a Quarto .qmd.
  3. Commit and push your changes to your fork (git push in the RStudio Terminal, not the “Push” button), but don’t create a PR yet! Visit your fork on GitHub and inspect the Markdown files, just for formatting: Do they look like you expect (e.g., are there any stray formatting marks, does the R output look right, etc.)? Write a few notes about what seems to be working or not working, including a muddiest point, as todo4/markdown_notes_YOURNAME.md; commit and push to your fork.

Submission

You should have 4 new files in the todo4/ directory on your fork: dplyr notes as qmd, notes as md, dplyr muddiest point, Markdown notes. Create a pull request for me. As always, ask questions if you run into any difficulties!

To-do 5

Due Sep 11 by 12 noon

  1. Learn about tidy data and pivoting in the Data tidying chapter! Create tidy-data_notes_YOURNAME.qmd, rendering it to gfm.

  2. Create a muddiest point: tidy-data_muddiest_YOURNAME.md

Submission

The usual, in the todo5/ directory of the Class-Exercise-Repo.

To-do 6

Due Sep 16 by 12 noon

  1. Learn about relational data in the Joins chapter! You know the drill by now: Create relational_notes_YOURNAME.qmd, rendered to gfm; plus a muddiest point as relational_muddiest_YOURNAME.md.

  2. Your final projects will be published as public repositories, just like previous versions of this class. (I’ll release more info about the project next week.) Choose two projects from the list on the todo6 README.md to review. Read their project_plan.md, progress_report.md, and final_report.md, then create a one-paragraph summary (for each) of how the project evolved. Write up your observations in the file proj-observations_YOURNAME.md.

  3. Play around with some more keyboard shortcuts in RStudio. Describe what you learned in a brief notes file: keyboard-shortcuts_YOURNAME.md.

Submission

The usual, in the todo6/ directory of the Class-Exercise-Repo.

To-do 7

Due Sep 18 by 12 noon

  1. Learn about readr in the Data import chapter. You know the drill by now—you should have three files for this chapter: readr_notes_YOURNAME.qmd, readr_notes_YOURNAME.md, readr_muddiest_YOURNAME.md.

  2. Read through your earlier muddiest points. Pick one that you feel like you understand better now (the key word being better—it’s okay if you’re not all the way there yet!). In old_muddiest_YOURNAME.md, discuss (1) what you now understand that you didn’t before, and (2) how you got there.

Submission

The usual, in the todo7/ directory

To-do 8

Due Sep 25 by 12 noon

  1. Learn about hierarchical data in the Hierarchical data chapter. You know the drill by now—you should have three files for this chapter: hierarchical-data_notes_YOURNAME.qmd, hierarchical-data_notes_YOURNAME.md, hierarchical-data_muddiest_YOURNAME.md.

  2. Do the same with the Iteration chapter.

  3. Revisit the “letters to future students” from To-do 1, now available in our Class-Exercise-Repo here. What’s one piece of project-related advice that you found helpful, and why? Put it in project-advice_YOURNAME.md.

Submission

The usual, in the todo8/ directory

To-do 9

Due Sep 30 by 12 noon

  1. Learn about strings in the Strings chapter. You know the drill by now—you should have three files for this chapter: strings_notes_YOURNAME.qmd, strings_notes_YOURNAME.md, strings_muddiest_YOURNAME.md.

Submission

The usual, in the todostrings/ directory

To-do 10

Due Oct 2 by 12 noon

  1. Learn about regular expressions in the Regular expressions chapter. You know the drill by now—you should have three files for this chapter: regex_notes_YOURNAME.qmd, regex_notes_YOURNAME.md, regex_muddiest_YOURNAME.md.

Submission

The usual, in the todo10/ directory

To-do 11

Due Oct 9 by 12 noon

  1. Read Villarreal 2024 on fairness in sociolinguistic auto-coding. If you want some background on sociolinguistic auto-coding, check out Kendall et al. 2021. Come up with 1-2 discussion questions.

Submission

We’re going to try something a little different this time: collaborative editing!

  • As usual, pull changes from upstream to your local repo.
  • This time, instead of creating a new file, edit the existing todo11/discussion-questions.md file: add your question(s) under your name.
  • Commit, push to your fork, and create a pull request.

If we’ve all done this correctly, I should be able to merge all your commits together without any conflicts.

To-do 12

Due Oct 14 by 12 noon

  1. Read Bender et al. 2021 on large language models and come up with 1–2 discussion questions.
    • Note: This paper eventually led to Google firing two of its authors, Timnit Gebru and Margaret Mitchell; Timnit Gebru went on to launch the Distributed Artificial Intelligence Research Institute (DAIR). These two magazine articles (which are paywalled, unfortunately), describe the full story: Hao 2021, Perrigo 2022.

Submission

Like with todo11, we’re going to do collaborative editing.

  • As usual, pull changes from upstream to your local repo.
  • This time, instead of creating a new file, edit the existing todo12/discussion-questions.md file: add your question(s) under your name.
  • Commit, push to your fork, and create a pull request.

If we’ve all done this correctly, I should be able to merge all your commits together without any conflicts.

To-do 13

Due Oct 16 by 12 noon

Time for a midterm check-in!

  1. Write up a short reflection on the following questions (2–3 sentences apiece):
    • How do you think the midterm is going for you overall?
    • What have you had the most trouble with?
    • Where are you turning to for help (your notes, the textbook, class recordings, the internet, etc.), and with what?
    • (If applicable) Any other questions for me?
  2. Commit whatever you’ve got right now and push it to your fork
    • This is just so I can keep track of what’s been done before vs. after our discussion in class on Thursday

Submit your reflection on the Class Exercise Repo as todo13/midterm_reflection_YOURNAME.md

To-do 14

Due Oct 23 by 12 noon

  1. Learn about the tidy text format in chapters 1 and 3 of [Text Mining with R][TMwR]. Include the usual files (for both chapters combined, not per-chapter): tidytext_notes_YOURNAME.qmd, tidytext_notes_YOURNAME.md, tidytext_muddiest_YOURNAME.md.
  2. One additional file: a brief note on tidytext will be useful for your final project (or not, if it’s not applicable). Write this up as tidytext_project_YOURNAME.md.

Submission

The usual, in the todo14/ directory

To-do 15

Due Oct 30 by 12 noon

  1. Let’s have a nice meal at the diner—the CSS Diner! This is a fun website for learning/practicing CSS selectors, a necessary skill for web-scraping. Set your timer for 1 hour and get as far as you can. (This isn’t for the sake of pressure or competition, just time management. If your project involves web-scraping, go further and let me know how far you got in the initial hour.) Write up a brief reflection, including a muddiest point, as css_reflection_YOURNAME.md.

Submission

The usual, in the todo15/ directory

To-do 16

Due Nov 4 by 12 noon

What has everyone been up to? Let’s take a look – it’s a “visit your classmates” to-do!

I’ve set up a directory with project guestbooks in our Class-Exercise-Repo.

  1. Visit your classmates’ projects! The order of who visits who is in project-guestbooks/README.md.
  2. Take a look around, and write in their guestbook (not your own!). Your entry should consist of (at least):
    • One thing you thought was done well
    • One suggestion or avenue for improvement
    • One thing you learned
  3. Clone their repo to your computer and see if you can reproduce their data pipeline as it currently stands. Ideally, it should be clear enough from the directions in their repo how to run their pipeline. In some cases it might be impossible because of private data; I’ll leave it to you and your “visitee” to decide whether they’re comfortable sharing data. Include a section in your guestbook entry about:
    • How easy or hard it was to run their data pipeline (or if you were unable to), and why
    • If you were able to run their pipeline, did you get the same end result? If not, don’t worry about this part.

Once you’re ready to submit, push to your fork and create a pull request for me.

To-do 17

Due Nov 6 by 12 noon

Right about now is a good time to re-anchor ourselves in the wisdom of those who came before us.

  1. Revisit the letters to future Data Science students from the beginning of the semester and identify:
    • 1–2 pieces of advice you’ve done a good job following so far.
    • 1–2 pieces of advice you plan to focus on some more in this final push.
    • Anything else you’d like to comment on.

    Write this up as letters_YOURNAME.md.

Submission

The usual, in the todo17/ directory.

To-do 18

Due Nov 13 by 12 noon

It’s another “visit your classmates” day! This is like to-do 16, but with 1 new thing:

  1. Visit your classmates’ projects! The order of who visits who is in project-guestbooks/README.md.
  2. Take a look around, and write in their guestbook (not your own!). Your entry should consist of (at least):
    • One thing you thought was done well
    • One general suggestion or avenue for improvement
    • (New) One specific recommendation for how you would modify their code (e.g., “I would use map() instead of lapply() in the analysis of word frequency”)
    • One thing you learned
    • (New) What you would do next if this was your project
  3. Clone your classmate’s repo and see if you can reproduce their data pipeline

To-do 19

Due Nov 18 by 12 noon

We’ll continue with project workdays for the next two class meetings. Write up a brief memo, troubleshooting_YOURNAME.md, with 1–2 troubleshooting questions. Put it in the todo19/ folder of our class exercise repo. (If you’re in the habit of completing to-dos early, you might want to wait on this one until shortly before class so you can get help with your latest challenge!)

To-do 20

Due Dec 12 by 6pm

This to-do is completely optional. But I recommend doing it anyway—reflection and “paying it forward” both tend to be therapeutic.

It’s been a long semester. You’ve learned tons, struggled over tough coding challenges, and attained some satisfying wins. Now is a great time to step back and reflect on where you’ve come and how you’ve gotten there.

Reflection

This part of the assignment is for your own reference only—complete it, but do not turn it in.

  • First, think back over your engagement in our class this past semester. What strategies did you use to internalize the content? How did you engage with the textbook and create your chapter notes? How did you collaborate with other students on assignments, if at all? Did you ask questions in office hours and/or in class? What were your strategies for getting help?
  • Next, think about your performance in our class this past semester. How did your strategies for engaging the content, both in and out of class, affect your performance on assignments? Did you experiment with multiple strategies, and how did they affect your performance? How well did you keep up with assignments, especially as the semester got busier?
  • Finally, think about how you can apply what you’ve learned in this class about yourself as a researcher to your future as a researcher—regardless of whether that research involves data science or R coding.

Action

I will be teaching Data Science again in the future. Using as guidance your reflection from the first part of the assignment, write a letter to future Data Science students detailing strategies that you found helped you to be successful in this class. These can be general strategies (“Be sure to always…”) or specific strategies (“In [X unit], keep in mind that…”). Don’t worry about whether or not these strategies work as well for everyone as they do for you; if you have information about different strategies that worked for others, feel free to include that. Especially useful are things you wish you’d known at the start of the semester.

Most, if not all, of these letters will be made available to future Data Science students, so please be as informative and thoughtful as possible! To protect your educational privacy, your letter will be anonymized before being shared with other students, and I would recommend you avoid referring to anything that could specifically identify you as the letter-writer (e.g., details of your final project). Please complete this assignment on your own.

There’s no length requirement; if you want a guideline, shoot for 300–500 words. Keep in mind that this is not the same as a course evaluation—though please fill out OMETs if you haven’t yet! The point of this letter is not to evaluate the class or my teaching, but to help future students succeed.

Submission

Write up your letter as letter_to_future_students_YOURNAME.md and DM it to me on Slack.