To-do assignments

On this page
  1. Submission guidelines
  2. To-do 1
  3. To-do 2
  4. To-do 3
  5. To-do 4
  6. To-do 5
  7. To-do 6
  8. To-do 7
  9. To-do 8
  10. To-do 9
  11. To-do 10
  12. To-do 11
  13. To-do 12
  14. To-do 13

Submission guidelines

  • If the To-do includes submitting a Quarto document:
    • Render the file to GitHub-flavored Markdown (format: gfm in the YAML header)
      • If errors, fix and try rendering again
    • Make sure the rendered file doesn’t have super-long outputs
      • To see what I mean, create a qmd that just contains as.data.frame(ggplot2::diamonds)
    • Submit both the source .qmd file and the rendered .md file
  • If the To-do includes submitting a Markdown file (whether it’s a rendered Quarto document or a standalone Markdown file):
    • Commit & push the file to your fork to check formatting
      • If formatting looks off, commit & push changes until it looks right (this might mean a lot of commits early on when you’re still learning, and that’s okay!)
  • When creating a pull request, check:
    • Whether GitHub can merge your changes
    • The “Files changed” tab: Anything unexpected?
      • If we’ve done an in-class assignment since your last pull request, then those files will be included in the PR. That’s fine as long as you’ve added your name as a suffix to the file

To-do 1

Due Aug 28 by 12 noon

  1. At the end of the semester, previous students in this class wrote anonymous letters to future Data Science students detailing their advice for how to succeed in the course. These letters are pinned to our Slack #general channel here. Please read these letters and identify:
    • One idea that surprised you and why it surprised you
    • Two concrete strategies that you will take in order to succeed in Data Science this semester
  2. This semester, you’ll frequently be asked to identify a muddiest point: some concept, skill, or nagging question that’s giving you the most issues or that you’re most unsure about. This could be something we’ve already gone over, or an extension of what we’ve discussed.
    • Read the assigned readings from our R for data science textbook (the intro and chapter 2).
    • Then identify your muddiest point about the chapters.

    Good muddiest points are specific questions. Using Git as an example:

    • “Why does Git make us stage files before we commit them?”
    • “I keep getting the message no changes added to commit (use "git add" and/or "git commit -a"), what am I doing rong?”
    • “What is .gitignore?”
    • “Do you [Dan] actually use Git in your projects?”

    Not-so-good muddiest points are noun phrases or vague questions:

    • “reverting files”
    • “How do I use Git?”
  3. The next step on our Git journey is GitHub
    • Create a GitHub account at https://github.com/, if you don’t already have one. Pick a good username because changing your GitHub username creates annoying problems.
    • Add your Pitt email address to your account.
    • Check your inbox and spam folder for a verification email.
    • Add an avatar to your GitHub account (usually a headshot, but can be anything you like).
    • In your submission, send me the link to your GitHub user page (i.e., https://github.com/YOUR-USERNAME). You don’t have to add any other personalization, just send me the link.

Submission

Write up your answers in a text file (should have the .txt extension) or a Markdown file (.md) if you’re comfortable with Markdown. Name the file todo1_YOURNAME.txt or todo1_YOURNAME.md (replace YOURNAME with your actual name). Share it to the #to-dos channel on our Slack workspace.

To-do 2

Due Sep 2 by 12 noon

Time for some hands-on practice! Do the following:

  1. Learn about ggplot2! Go through the Data visualization and Workflow: scripts and projects chapters in our class version of R for data science. (Pay attention to the yellow blocks, where I’ve injected notes for our class into the chapter!)

    It’s up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doing—not just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.

  2. Future to-dos will ask you to create your own study notes for R4DS chapters. For now, your task is to evaluate some past students’ ggplot2 notes, which you can find at https://github.com/Data-Sci-2025/Class-Exercise-Repo/tree/main/todo2/old-notes. Compare and contrast: If you were future-you, how helpful would you find these notes? Are there some notes you’d find more helpful than others? Pay attention not only to content but also style, formatting, and organization. There’s not a right or wrong answer here—different things work for different people! Write up your observations in notes-on-notes_YOURNAME.txt (or .md).

    These notes were for the previous edition of R4DS (https://r4ds.had.co.nz/data-visualisation.html), hence any differences in section numbers, etc.

  3. Think about what Git can do, and think back to the ways you’ve managed files in the past. Even if you didn’t realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?

    Example answer
    When I’m writing a paper, I’ll add a versionX suffix like version0.5, version0.6, etc. Once I feel like I’ve made decent progress, I’ll increment the version number and add a little comment to the top of the file describing the changes since the previous version. Then I don’t touch the old file, and I only work on the latest version. Each new version is basically like a Git commit, and the little comment is like a commit message. In my current system, the little comments aren’t easy to glance over; if I’m looking to undo a previous change, I have to re-open each old version to read through the commits. Git can do that more easily by letting me see commits and diffs at a glance. Plus, in my current system there’s no guarantee that I won’t accidentally change an old version rather than the latest version, which would render the “commit message” useless. Git prevents that from happening because once changes are committed, they’re there.

Write up your response as git_notes_YOURNAME.txt (or .md)

  1. If you haven’t set a personal access token for GitHub, do so now:
    1. In the R console, run usethis::create_github_token()
    2. This will load a GitHub page for you to generate a PAT
      • In “Note”, write “My laptop” or something that describes your local machine
      • In “Expiration” set “No expiration” (even though GitHub doesn’t recommend it)
      • Leave everything else as-is
      • Click “Generate token” and copy your token to the clipboard
    3. Back in RStudio, run gitcreds::gitcreds_set() without any arguments
    4. Follow the prompts and paste your token when asked
    5. If needed, change your Git config user name to match your GitHub user name: usethis::use_git_config(user.name = "YOUR GITHUB USERNAME")

Submission

Share your notes on the #to-dos channel on Slack

To-do 3

Due Sep 4 by 12 noon

  1. Now that you’ve reviewed past students’ study notes, it’s your turn! Read up on Quarto, a literate programming framework for data science, in the Quarto chapter of R for data science. Create some notes that’ll be helpful for future-you in a Quarto document called quarto_notes_YOURNAME.qmd.

  2. What’s your muddiest point for ggplot2? Create a file called ggplot2_muddiest_YOURNAME.txt that has your muddiest point: After going through the Data visualization chapter, what’s the concept or skill that’s giving you the most issues? What are you most unsure about?

  3. Attempt to pull Class-Exercise-Repo from upstream. This will add a new directory to your local repo: todo3/. If you get an error message, ask me about it in Slack #q-and-a

  4. On our class GitHub organization, set your visibility to public

    • Go to https://github.com/orgs/Data-Sci-2025/people
    • To the right of your name, there’s a drop-down menu with “Private”. Select “Public” instead.

Submission

From here on out, to-do submissions will take the form of GitHub pull requests. You should have 2 new files: Quarto notes and ggplot2 muddiest point. Add your files to todo3/, stage, commit, and push to origin. Start a pull request, and remember the “PR checklist”: check whether GitHub can merge your changes, and check the changed files. If it looks good, open the pull request.

To-do 4

Due Sep 9 by 12 noon

It’s time to flex our newfound skills with Markdown and Quarto!

  1. Learn about dplyr! through the Data transformation chapter in our class version of R for data science, and create your own study notes as todo4/dplyr_notes_YOURNAME.qmd. Then render to a GitHub-flavored Markdown file todo4/dplyr_notes_YOURNAME.md.

    A string like “todo4/dplyr_notes_YOURNAME.qmd” is a shorthand way of referring to a path: both the file (dplyr_notes_YOURNAME.qmd) and the directory that it sits in (todo4/). You’ll sometimes see paths with a file multiple directories deep (e.g., ~/Documents/Research/research-plan.md). You should not include todo4/ in the file name.

  2. Create a muddiest point for dplyr as todo4/dplyr_muddiest_YOURNAME.md.
    • Note: Just a regular Markdown document, not a Quarto .qmd.
  3. Commit and push your changes to your fork (git push in the RStudio Terminal, not the “Push” button), but don’t create a PR yet! Visit your fork on GitHub and inspect the Markdown files, just for formatting: Do they look like you expect (e.g., are there any stray formatting marks, does the R output look right, etc.)? Write a few notes about what seems to be working or not working, including a muddiest point, as todo4/markdown_notes_YOURNAME.md; commit and push to your fork.

Submission

You should have 4 new files in the todo4/ directory on your fork: dplyr notes as qmd, notes as md, dplyr muddiest point, Markdown notes. Create a pull request for me. As always, ask questions if you run into any difficulties!

To-do 5

Due Sep 11 by 12 noon

  1. Learn about tidy data and pivoting in the Data tidying chapter! Create tidy-data_notes_YOURNAME.qmd, rendering it to gfm.

  2. Create a muddiest point: tidy-data_muddiest_YOURNAME.md

Submission

The usual, in the todo5/ directory of the Class-Exercise-Repo.

To-do 6

Due Sep 16 by 12 noon

  1. Learn about relational data in the Joins chapter! You know the drill by now: Create relational_notes_YOURNAME.qmd, rendered to gfm; plus a muddiest point as relational_muddiest_YOURNAME.md.

  2. Your final projects will be published as public repositories, just like previous versions of this class. (I’ll release more info about the project next week.) Choose two projects from the list on the todo6 README.md to review. Read their project_plan.md, progress_report.md, and final_report.md, then create a one-paragraph summary (for each) of how the project evolved. Write up your observations in the file proj-observations_YOURNAME.md.

  3. Play around with some more keyboard shortcuts in RStudio. Describe what you learned in a brief notes file: keyboard-shortcuts_YOURNAME.md.

Submission

The usual, in the todo6/ directory of the Class-Exercise-Repo.

To-do 7

Due Sep 18 by 12 noon

  1. Learn about readr in the Data import chapter. You know the drill by now—you should have three files for this chapter: readr_notes_YOURNAME.qmd, readr_notes_YOURNAME.md, readr_muddiest_YOURNAME.md.

  2. Read through your earlier muddiest points. Pick one that you feel like you understand better now (the key word being better—it’s okay if you’re not all the way there yet!). In old_muddiest_YOURNAME.md, discuss (1) what you now understand that you didn’t before, and (2) how you got there.

Submission

The usual, in the todo7/ directory

To-do 8

Due Sep 25 by 12 noon

  1. Learn about hierarchical data in the Hierarchical data chapter. You know the drill by now—you should have three files for this chapter: hierarchical-data_notes_YOURNAME.qmd, hierarchical-data_notes_YOURNAME.md, hierarchical-data_muddiest_YOURNAME.md.

  2. Do the same with the Iteration chapter.

  3. Revisit the “letters to future students” from To-do 1, now available in our Class-Exercise-Repo here. What’s one piece of project-related advice that you found helpful, and why? Put it in project-advice_YOURNAME.md.

Submission

The usual, in the todo8/ directory

To-do 9

Due Sep 30 by 12 noon

  1. Learn about strings in the Strings chapter. You know the drill by now—you should have three files for this chapter: strings_notes_YOURNAME.qmd, strings_notes_YOURNAME.md, strings_muddiest_YOURNAME.md.

Submission

The usual, in the todostrings/ directory

To-do 10

Due Oct 2 by 12 noon

  1. Learn about regular expressions in the Regular expressions chapter. You know the drill by now—you should have three files for this chapter: regex_notes_YOURNAME.qmd, regex_notes_YOURNAME.md, regex_muddiest_YOURNAME.md.

Submission

The usual, in the todo10/ directory

To-do 11

Due Oct 9 by 12 noon

  1. Read Villarreal 2024 on fairness in sociolinguistic auto-coding. If you want some background on sociolinguistic auto-coding, check out Kendall et al. 2021. Come up with 1-2 discussion questions.

Submission

We’re going to try something a little different this time: collaborative editing!

  • As usual, pull changes from upstream to your local repo.
  • This time, instead of creating a new file, edit the existing todo11/discussion-questions.md file: add your question(s) under your name.
  • Commit, push to your fork, and create a pull request.

If we’ve all done this correctly, I should be able to merge all your commits together without any conflicts.

To-do 12

Due Oct 14 by 12 noon

  1. Read Bender et al. 2021 on large language models and come up with 1–2 discussion questions.
    • Note: This paper eventually led to Google firing two of its authors, Timnit Gebru and Margaret Mitchell; Timnit Gebru went on to launch the Distributed Artificial Intelligence Research Institute (DAIR). These two magazine articles (which are paywalled, unfortunately), describe the full story: Hao 2021, Perrigo 2022.

Submission

Like with todo11, we’re going to do collaborative editing.

  • As usual, pull changes from upstream to your local repo.
  • This time, instead of creating a new file, edit the existing todo12/discussion-questions.md file: add your question(s) under your name.
  • Commit, push to your fork, and create a pull request.

If we’ve all done this correctly, I should be able to merge all your commits together without any conflicts.

To-do 13

Due Oct 16 by 12 noon

Time for a midterm check-in!

  1. Write up a short reflection on the following questions (2–3 sentences apiece):
    • How do you think the midterm is going for you overall?
    • What have you had the most trouble with?
    • Where are you turning to for help (your notes, the textbook, class recordings, the internet, etc.), and with what?
    • (If applicable) Any other questions for me?
  2. Commit whatever you’ve got right now and push it to your fork
    • This is just so I can keep track of what’s been done before vs. after our discussion in class on Thursday

Submit your reflection on the Class Exercise Repo as todo13/midterm_reflection_YOURNAME.md