Daily To-do Assignments

Daily To-do Assignments

To-do 1

Due by 12pm Thursday, Sep 2

The Internet is full of published linguistic data sets. Let’s data-surf! Instructions:

  1. Go out and find two linguistic data sets you like. One should be a corpus, the other should be something friendlier for R (typically, a data in table form). They must be free and downloadable in full. Make sure they are linguistic data sets, meaning designed for linguistic inquiries.

  2. You might want to start with various bookmark sites listed in the Datasets section of our Learning Resources page. But don’t be constrained by them.

  3. Download the data sets and poke around. Open up a file or two to take a peek.

  4. In a text file (should have the .txt extension), make note of:
    • The name of the data resource
    • The author(s)
    • The URL of the download page
    • Its makeup: size, type of language, format, etc.
    • License: whether it comes with one, and if so what kind?
    • Anything else noteworthy about the data. A sentence or two will do.
  5. If you are comfortable with markdown, make an .md file instead of a text file.

Submission: Upload your text file to the To-do 1 submission link, on Canvas.

To-do 2

Due by 12pm Tuesday, September 7

Time for some hands-on practice! Do the following:

  1. Install/update the tidyverse by opening up RStudio and running install.packages("tidyverse") in the console. If you’ve done it correctly, then running packageVersion("tidyverse") should return ‘1.3.1’

  2. Learn about ggplot2! Go through the data visualization chapter in our class version of R for data science. (Pay attention to the yellow blocks, where I’ve injected notes for our class into the chapter!) And then go through at least one other data visualization resource on our learning resources page.

It’s up to you how thoroughly you want to interact with these materials. You could just read them, or you could just copy and paste the code. But for coding, the most effective way to learn is by doing—not just typing out all the commands yourself and ensuring you get the same output, but tinkering and exploring.

  1. As you go through the learning materials, create your own study notes, as an R Markdown file called ggplot2_notes_YOURNAME.Rmd. Include examples, explanations, etc. You are essentially creating your own reference material.

Submission: Share your notes on the #todo2 channel on Slack

To-do 3

Due by 12pm Thursday, September 9

  1. Share your GitHub username in a file called github_YOURNAME.txt (e.g., github_Dan.txt)

  2. Learn about dplyr! Like you did for To-do 2, go through the data transformation chapter in our class version of R for data science, and create your own study notes as dplyr_notes_YOURNAME.Rmd.

  3. What’s your muddiest point for dplyr? Create a file called dplyr_muddiest_YOURNAME.txt that has your muddiest point: After going through the dplyr chapter, what’s the concept or skill that’s giving you the most issues? What are you most unsure about?

Submission: Share your files on the #todo3 channel on Slack. You should have 3 files:

To-do 4

Due by 12pm Tuesday, September 14

It’s time to flex our newfound skills with GitHub and R Markdown!

  1. Clean up your R Markdown files from To-dos 2 & 3. Use R code chunks, text formatting, headings, and session info.

  2. Knit your R Markdown files to GitHub-flavored markdown files

Submission: Put your files in the todo2/ and todo3/ directories of the Class-Exercise-Repo. You should have 2 files in each: an Rmd file and a md file. Commit your changes, push to your GitHub fork, and create a pull request for me.

To-do 5

Due by 12pm Thursday, September 16

  1. Go through two chapters of our class version of R4DS: the short tibbles chapter and the tidy data chapter.

  2. Create two Rmd files of your notes, tibble_notes_YOURNAME.Rmd/tidyr_notes_YOURNAME.Rmd. Include your muddiest point for each chapter.

  3. Knit as github_document files.

Submission: Put your files in the todo5/ directory of the Class-Exercise-Repo. You should have at least 4 files (one Rmd & one md for each chapter), plus any image directory(ies) that you create, if applicable. Commit your changes, push to your GitHub fork, check that the md files look like you expect, and create a pull request for me.

To-do 6

Due by 12pm Tuesday, September 21

  1. Go through either the data import chapter or the relational data chapter in our our class version of R4DS. It’s up to you to decide which will be more useful for your research needs:
    • Data import covers how to get raw tabular data into R (raw = it’s in a file rather than a nice R object; tabular = it’s in rows & columns, like an R dataframe). In particular, it covers cases where the raw file has some potentially gnarly features (weird delimiters, multiple header rows, lots of sparsity, etc.)
    • Relational data covers what to do if your data is spread across multiple tables—how do you put those disparate pieces of data together, what do you do if there are missing values, etc.?
    • You might be wondering “what if I want to deal with non-tabular raw data?”. We’ll touch on that in week 5!

You know the drill from here: Create the files as Rmd, knit as github_document files, put your files in Class-Exercise-Repo/todo6/, add/commit/push, and create a pull request for me.

To-do 7

Due by 12pm Thursday, September 23

  1. Explore the stringr package by going through the R4DS strings chapter. Give yourself time to go through this chapter, as it’s on the longer side. And don’t forget to list your muddiest point(s)!

You know the drill from here.

To-do 8

Due by 12pm Tuesday, September 28

Time for more regex practice, this time with a longer text.

  1. Like we did in class today, pull the upstream remote into your local fork (check the video if you forget how!). If you get a merge conflict, post a screenshot of your bash/terminal window in the Slack #q-and-a channel. Do not DM me on Slack—we benefit from one another’s questions.
  2. You should have a new folder & file todo8/regex-practice_YOURNAME.Rmd. Change the file name immediately so your own name is in it.
  3. Follow the instructions in the file, and knit.
  4. Add your Rmd and md files (do not git add .!), commit (with an informative message), push to your upstream fork, and create a pull request for me.

Don’t forget the regex learning resources are at your disposal!

To-do 9

Due by 12pm Thursday, September 30October 7 (postponed due to illness)

Let’s pool our questions together for Dr. Lauren Collister, who will be our guest speaker on Thursday. Review the topic of open access and data publishing, focusing in particular on the first two resources (“Data Sharing for Linguists” and the “Copyright and Intellectual Property Toolkit”).

Think of a question or two on the topic, and add yours along with your name to this Google document.

SUBMISSION: The Google document is your submission—don’t forget to add your name. Dr. Collister will take a look at your questions before class, and any questions that are asked by 1pm on Wednesday are guaranteed to be answered!

To-do 10

Due by 12pm Thursday, October 14

The key tool for phonetic analysis of speech data is the free program Praat. In addition to a click-through GUI, Praat has its own scripting language to automate tasks. Let’s learn all about Praat scripting! Do the following:

  1. Download the latest version of Praat (6.1.54) here. If you already have Praat on your computer, make sure to update it!
  2. There are four Praat scripting tutorials on the learning resources page. Take a few minutes exploring each tutorial, then pick the one that is easiest for you to understand.
  3. With your chosen tutorial, learn as much about Praat scripting as you would like. You don’t have to go through the entire tutorial, but you should at least learn about saving & running scripts, variables, commands, object selection, if & for, and file input/output.
    • Some of the tutorials come with files that you can play with, which you might find useful in learning Praat scripting. You can also play with files from various speech corpora
  4. Write up a Praat scripting reference card for yourself as a .md document, named praat_reference_YOURNAME.md. (Note: Since this file won’t contain any R code, you can create this .md file directly without first writing a .Rmd file!) Make sure you include at least one muddiest point for Praat scripting that we can discuss in class on Thursday!

Submission: Put your .md file in Class-Exercise-Repo/todo10/ (check spelling/punctuation of the folder name!), add/commit/push, and create a pull request for me.

To-do 11

Due by 12pm Tuesday, October 19

Let’s learn more about how to use LaBB-CAT! I’ve posted some worksheets in a new repository in our class GitHub organization; you don’t have to clone the repo, unless you want to keep the files around for your own reference. Do the following:

  1. Follow along with the first three worksheets (at least—feel free to do more if you’re curious!).
  2. Once you’re done, write two Markdown files:
    1. labbcat_research_YOURNAME.md: How could using LaBB-CAT to organize linguistic data (whether it’s speech data, written data, etc.) benefit the type of research you’ve done and/or are interested in doing? Or, what sorts of additional functionality would LaBB-CAT need to have in order to benefit the type of research you’ve done and/or are interedted in doing? Feel free to be creative here!
    2. labbcat_search_YOURNAME.md: Write a short ‘search problem’ that a classmate could complete after doing the first three worksheets (see examples at the bottom of worksheet 3). Again, be creative here! If the solution requires any layers that are referenced in worksheets 4, 5, or 6, please mention that.

Submission: Put your .md file in Class-Exercise-Repo/todo11/ (check spelling/punctuation of the folder name!), add/commit/push, and create a pull request for me.

To-do 12

Due by 12pm Thursday, October 21

Which are longer: Words that start with stops, or words that start with fricatives? Time to put our LaBB-CAT and R skills together to find out!

  1. In the demo LaBB-CAT corpus, find all words that start with a stop or a fricative (ignore affricates). I got 16,227 hits.
  2. Export a csv file with whichever layers you think are appropriate, and import it as a dataframe into R. Note that most column names will begin with Target or Target.
  3. Add a new column firstPhon with the first phoneme in the word, and look at the distribution of firstPhon. Try to explain how w, I, or c got into the data.
  4. Explore whichever column(s) you chose to represent word length: what is the distribution? Does it suggest that the data need to be cleaned up?
  5. Finally, use the data to answer the original question, “Which are longer: Words that start with stops, or words that start with fricatives?”
  6. Create an Rmd file that describes your analysis (including how you extracted the data from LaBB-CAT, why you chose the columns you did, etc.), and knit it as a github_document. You may want to play around in an R Notebook before creating your final Rmd (though this analysis is pretty short).

Submission: Put three files—the downloaded csv, your Rmd, and the knitted md file—in a subfolder YOURNAME/ within Class-Exercise-Repo/todo12/. (This is because there might be overlaps in csv file names!). Add/commit/push, and create a pull request for me.

To-do 13

Due by 12pm Tuesday, November 2

What has everyone been up to? Let’s take a look – it’s a “visit your classmates” day!

Submission: Since Class-Lounge is a fully collaborative repo, there is no formal submission process; simply add your comments to the guestbook.