This document contains errata (and non-error important notes) for the textbook Statistics for Linguists: An Introduction Using R (Bodo Winter, 2019, Routledge). Please feel free to suggest other errata by creating a GitHub issue.


Page Text Comment
xv The following R packages need to be installed to be able to execute all code in all chapters Some packages may be unnecessary depending on what you plan to do:
  • swirl is only used for a single exercise in Chapter 1
  • pscl is only used in Chapter 13
  • lme4 isn’t used until Chapter 14
  • afex and MuMIn aren’t used until Chapter 15
  • brms isn’t actually used for code, just referenced in Appendix B

Chapter 1: Introduction to R

Page Text Comment
14 (Code output for mydf) The alignment of this output apparently got messed up in the book publication process, but your output should be nicely lined up between the column headings (like participant) and the values (like louis)
15 Notice one curiosity: the participant column is indicated to be a factor vector, even though you only supplied a character vector! The data.frame() function secretly converted your character vector into factor vector As of R version 4.0, functions that create dataframes (e.g., data.frame(), read.csv()) default to leaving character vectors as-is, rather than converting to factor vectors. When you run str(mydf), the second line will instead read $ participant: chr "louis" "paula" "vincenzo"
This change in R’s default behavior also affects other outputs in this chapter.
16 mydf[mydf$participant == 'vincenzo',] $score Extra space before $. This doesn’t actually affect R’s output (try it yourself both ways!), but typically we write $ without a leading space.
17 nettle <- read.csv('nettle_1999_climate.csv') As Winter mentions, this code only works if your working directory is the same as wherever you’ve downloaded nettle_1999_climate.csv. This is not always a trivial or easy thing for students to navigate! Instead, I typically direct students to load datasets directly from the OSF repository, In the files tab on that site (or on, click materials > data, then right-click the dataset you want and copy the URL; then you can run read.csv() (or read_csv() in Chapter 2) with the URL plus /download/, with quotation marks around the full URL. For example, to load the Nettle (1999) dataset, you can run nettle <- read.csv(''). Of course, this only works if you’re connected to the internet!

Chapter 2: The Tidyverse and Reproducible R Workflows

Page Text Comment
28 (Code output for nettle, showing <fct> under Country) For users of R version 4.0 or later, this will show as <chr>, for short for character vector. This is because R now defaults to leaving character vectors as-is, rather than converting to factor vectors.
29 But wait, didn’t I just tell you that tibbles default to character vectors? Why is the Country column coded as a factor? The culprit here is the base R function read.csv(), which automatically interprets any text column as factor. So, before the data frame was converted into a tibble, the character-to-factor conversion has already happened. Again, this discussion is moot, since now both read.csv() and read_csv() interpret text columns as characters, not factors.
29 Output of nettle <- read_csv('nettle_1999_climate.csv') The output looks different if you’re using the most recent version of readr. The only substantive difference is that readr now parses Langs as a double, not an integer.
38–39 geom_histogram(fill = 'peachpuff3') The book is in black and white, so the color doesn’t show up in the book’s version. It should if you run the code yourself.
44–45 In addition, there are code chunks, which always begin with three ''' (backward ticks, or the grave accent symbol).

# R code goes in here
The wrong character apparently got substituted in the publishing process. The symbol is ```, which is on the key to the left of 1. R markdown won’t know what to do with '''

Chapter 3: Descriptive statistics

Page Text Comment
53 The corresponding histogram is shown in Figure 3.1a (for an explanation of histograms, see Chapter 1.12). Figure 3.1a is a barplot, not a histogram.
56 (Footnote 2) This book uses N and n interchangeably (including in this footnote). In other texts, N refers to the size of a population and n to the size of a sample of that population.
64 war <- read_csv(' warriner_2013_emotional_valence.csv') There is an extra space after the first quotation mark. This space needs to be removed or else the code will yield an error.
65 (Code output at the bottom of the page) Content warning: This code output contains the most negative words in the dataframe, and there are some potentially triggering words in here.

Chapter 4: Introduction to the linear model

Page Text Comment
74 E4.7: \(y = b_{0} + b_{1} * x + e\) The error term is sometimes notated as epsilon (\(\epsilon\)), but notated as \(e\), it shouldn’t be confused with the natural logarithm base (also notated \(e\)), which is discussed in section 5.4.
75 Figure 4.6 shows the SSE as a function of different slope values. Should be Figure 4.5b, not Figure 4.6
77 Conversely, 32% of the variation in response durations is due to chance Should be 28% (100%$-$72%), not 32%.
78 plot(x., y, pch = 19) Extra dot. Should be
plot(x, y, pch = 19)
81–2 # A tibble: 50 x 2
# ... with 40 more rows
The preceding code will yield a tibble with 61 rows, not 50. So your output should be
# A tibble: 61 x 2
# ... with 50 more rows
83 (Code output starting with r.squared) The alignment of this output apparently got messed up in the book publication process, but your output should be nicely lined up between the labels (like r.squared) and the quantities (like 0.9283634)
83 The resultant plot will look similar to Figure 4.9 There is no Figure 4.9! Your plot will look similar to the following (but not identical, because we didn’t set a random seed):