Daily To-do Assignments

Daily To-do Assignments

To-do 1

Due by 12pm Thursday, Sep 1

In our first class, we used a pretty silly, non-real-life example to explore Git. (This is actually good programming practice—using a sandbox to try out individual skills or test small bits of code, separate from the real file(s) you’re working on!) Soon, however, you’ll be using Git for your own data science projects. Do the following:

  1. Think about what Git can do, and think back to the ways you’ve managed files in the past. Even if you didn’t realize it, you had a set of file-management practices. Describe your existing file-management practices. What are some benefits and drawbacks? How could Git help you improve your existing file-management practices?

    Example answer
    When I’m writing a paper, I’ll add a versionX suffix like version0.5, version0.6, etc. Once I feel like I’ve made decent progress, I’ll increment the version number and add a little comment to the top of the file describing the changes since the previous version. Then I don’t touch the old file, and I only work on the latest version. Each new version is basically like a Git commit, and the little comment is like a commit message. In my current system, the little comments aren’t easy to glance over; if I’m looking to undo a previous change, I have to re-open each old version to read through the commits. Git can do that more easily by letting me see commits and diffs at a glance. Plus, in my current system there’s no guarantee that I won’t accidentally change an old version rather than the latest version, which would render the “commit message” useless. Git prevents that from happening because once changes are committed, they’re there.

  2. This semester, you’ll frequently be asked to identify a muddiest point: some concept, skill, or nagging question that’s giving you the most issues or that you’re most unsure about. This could be something we’ve already gone over, or an extension of what we’ve discussed. After today’s lesson, what’s your muddiest point about Git?
    • Good muddiest points are specific questions:
      • “Why does Git make us stage files before we commit them?”
      • “I keep getting the message no changes added to commit (use "git add" and/or "git commit -a"), what am I doing wrong?”
      • “What is .gitignore?”
      • “Do you [Dan] actually use Git in your projects?”
    • Not-so-good muddiest points are noun phrases or vague questions:
      • “reverting files”
      • “How do I use Git?”
  3. The next step on our Git journey is GitHub
    • Create a GitHub account at https://github.com/, if you don’t already have one. Pick a good username.
    • Add your Pitt email address to your account.
    • Check your inbox and spam folder for a verification email
    • What’s your GitHub username?

Submission

Write up your answers in a text file (should have the .txt extension) or a Markdown file (.md) if you’re comfortable with Markdown. Name the file todo1_YOURNAME.txt or todo1_YOURNAME.md. Share it to the #to-dos channel on our Slack workspace.

To-do 2

Due by 12pm Tuesday, September 6

Time for some hands-on practice! Do the following:

  1. Install/update tidyverse by running install.packages("tidyverse") in the R console. If you’ve done it correctly, then running packageVersion("tidyverse") should return ‘1.3.2’

  2. Learn about ggplot2! Go through the data visualization chapter in our class version of R for data science. (Pay attention to the yellow blocks, where I’ve injected notes for our class into the chapter!)

It’s up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doing—not just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.

  1. As you go through the learning materials, create your own study notes, as a text file ggplot2_notes_YOURNAME.txt or ggplot2_notes_YOURNAME.Rmd (if you do an Rmd, no need to knit). Include examples, explanations, etc. You are essentially creating your own reference material.

Submission: Share your notes on the #to-dos channel on Slack

To-do 3

Due by 12pm Thursday, September 8

  1. Pull the upstream changes into your local Class-Exercise-Repo: In the Git tab, click the gear icon then Shell. In the shell, type git pull upstream main. After class, I modified the upstream so that it should pull into your local Class-Exercise-Repo; you’ll know if the todo3/ folder shows up in your local repo. If not and you’re still getting the error Your local changes to the following files would be overwritten by merge, let me know on Slack.

  2. Learn about dplyr! Like you did for To-do 2, go through the data transformation chapter in our class version of R for data science, and create your own study notes as dplyr_notes_YOURNAME.txt (or use formats .md or .Rmd if you prefer).

  3. What’s your muddiest point for dplyr? Create a file called dplyr_muddiest_YOURNAME.txt (or .md) that has your muddiest point: After going through the dplyr chapter, what’s the concept or skill that’s giving you the most issues? What are you most unsure about?

Submission: You should have 2 new files: notes and muddiest point. If you successfully pulled the upstream changes into your local repo, you’ll have a todo3/ folder; stage, commit, push to origin, and create a pull request. If git pull upstream main was unsuccessful, share the files on the Slack #to-dos channel instead and we’ll troubleshoot on Thursday.

To-do 4

Due by 12pm Tuesday, September 13

It’s time to flex our newfound skills with Markdown and R Markdown!

  1. Attempt to pull Class-Exercise-Repo from upstream. If you get an error message, ask me about it in Slack #q-and-a

  2. Clean up your notes from To-dos 2 & 3. Convert .txt files w/o R code into .md, .txt files with R code into .Rmd. For both, use text formatting and headings to make your documents clearer. For Rmds, use a YAML header (like this one), R code chunks, and session info.

  3. Try knitting your R Markdown files to GitHub-flavored markdown files. If it takes you more than ~30 minutes to debug, leave it as-is without knitting, and we’ll troubleshoot on Tuesday.

  4. You can now safely delete the .txt versions—they’re still in your repo history if you ever need to access them again.

Submission: Put your files in the todo2/ and todo3/ directories of the Class-Exercise-Repo. Commit your changes, push to your GitHub fork, and create a pull request for me. As always, ask questions if you run into any difficulties!

To-do 5

Due by 12pm Thursday, September 15

  1. If you haven’t done so already, resolve any lingering merge conflicts and push those changes to your remote fork so I can merge your previous to-dos into the upstream remote.

  2. Learn about tidyr in R4DS chapter 12! You know the drill by now—create tidyr_notes_YOURNAME.Rmd and tidyr_muddiest_YOURNAME.md, knitting the former to a github_document.

Submission: Put your files in the todo5/ directory of the Class-Exercise-Repo. Commit your changes, push to your GitHub fork, check that the md files are formatted like you expect, and create a pull request for me.

To-do 6

Due by 12pm Tuesday, September 20

  1. If you haven’t done so already, resolve any lingering merge conflicts and push those changes to your remote fork so I can merge your previous to-dos into the upstream remote.
  2. Learn about relational data in R4DS chapter 13. You know the drill by now—you should have three files, relational_notes_YOURNAME.Rmd, relational_notes_YOURNAME.md, relational_muddiest_YOURNAME.md.
  3. Put your files in the todo6/ directory of the Class-Exercise-Repo. After you commit, push, and create a pull request, look at the “Files changed” tab (or add /files to the end of the URL on the page for your PR). Does it look like you expect? That is, are the changes only the ones that you expect? If so, great! If not, create new commit(s) and push to your fork; remember that any commits you push while a PR is still open will be added to the PR.

To-do 7

Due by 12pm Thursday, September 22

  1. Same as usual, with the vectors chapter.

  2. In the data import chapter, please also read sections 11.1 & 11.2, skip 11.3, skim 11.4, and read 11.5. No need for notes/MPs on this chapter, just on the vectors chapter.

You know the drill from here!

To-do 8

Due by 12pm Tuesday, September 29

We’ll use a Wickham family tag-team to learn about iteration.

  1. If you’ve encountered so-called for-loops in other programming languages, then you can skip to #2! If not, it’s good to be aware of what they are, even though you probably won’t use them much in R. Read the R4DS iteration chapter, but just sections 21.1 & 21.2.

  2. Install the repurrrsive package, which has datasets that are useful for explaining & exploring iteration. Then watch this video, which is the first half of Charlotte Wickham’s tutorial on purrr. (Her brother, Hadley Wickham, co-wrote R4DS!) You can download the slides from the GitHub repo for the tutorial if you like. There’s a separate video that’s the second half of the tutorial, but that’s optional.

Submission: Write up your notes, a muddiest point, and at least one exercise that you design. Then the usual: commit, push to your fork, open a PR for me.

To-do 9

Due by 12pm Thursday, September 29

  1. Since we read the data import chapter already, just give it a quick re-skim. Add your muddiest points to readr_muddiest_YOURNAME.md.

  2. Check out some projects from previous versions of this class (Spring 2022, Spring 2021). Again, these projects used Python, which we don’t cover in this class; read them not for the code but for their scope. Choose two projects and read their project_plan.md and progress_report.md. In a paragraph or so, link to the projects and summarize how students’ projects evolved; add it to the file proj-observations_YOURNAME.md

Submission: Class-Exercise-Repo/todo9/

To-do 10

Due by 12pm Thursday, October 6

Explore the stringr package by going through the R4DS strings chapter. Give yourself time to go through this chapter, as it’s on the longer side. Feel free to consult the regular expressions learning resources!

You know the drill from here: notes & muddiest point(s) in todo10/.

To-do 11

Due by 12pm Thursday, October 20

Skim today’s readings on language models and pose a muddiest point or follow-up question for each (skipping the article(s) you’re presenting, of course). Add your MP or follow-up to Class-Exercise-Repo/data-ethics/ as (e.g.) bender_muddiest_Dan.md.

To-do 12

Due by 12pm Tuesday, November 1

Skim today’s readings on accountability to communities and pose a muddiest point or follow-up question for each (skipping the article(s) you’re presenting, of course). Add your MP or follow-up to Class-Exercise-Repo/data-ethics/ as (e.g.) bender_muddiest_Dan.md.

To-do 13

Due by 12pm Tuesday, November 8

Data sharing Q&A! Once you’ve watched Thursday’s lesson video by Dr. Lauren Collister, think of at least 1–2 questions on the topics Lauren covers, and add them to this Google doc.

Submission: The Google doc is your submission—don’t forget to add your name. Lauren will answer any questions that are asked by 12pm on Monday in class; she’ll address all questions in class, but she might not be able to get you a full answer if it’s asked closer to class.

To-do 14

Due by 12pm Thursday, November 10

What has everyone been up to? Let’s take a look – it’s a “visit your classmates” day!

I’ve set up a directory with project guestbooks in our Class-Exercise-Repo.

  1. Visit your classmates’ projects! The order of who visits who is in the project-guestbooks directory’s README.
  2. Take a look around, and write in their guestbook (not your own!). Your entry should consist of (at least):
    • One thing you thought was done well
    • One suggestion or avenue for improvement
    • One thing you learned

Once you’re ready to submit, push to your fork and create a pull request for me.

To-do 15

Last to-do of the semester!!

Due by 12pm Thursday, November 17

Same as to-do 14, but with different projects!

Once you’re ready to submit, push to your fork and create a pull request for me.