APLS Documentation

Welcome to the documentation homepage for the Archive of Pittsburgh Language and Speech (aka APLS, pronounced like apples)! APLS is a linguistic data resource, powered by the open-source linguistic corpus software LaBB-CAT, that contains:

recordings of interviews conducted with speakers native to Pittsburgh and surrounding neighborhoods,
annotated transcripts with information at the phrase, word, and individual speech sound levels, allowing these recordings to be used as structured linguistic data, and
metadata on interviewees and transcripts that facilitate large-scale (socio)linguistic analysis.

APLS is (and will always be) free to use. APLS currently contains 232 sound files totaling nearly 37 hours of audio from 35 interviewees. APLS is currently under construction; when complete, it will contain 270 sound files totaling 45 hours of audio from 40 interviewees.

Demo: Measuring F1 and F2 for /aw/ in closed syllables

Some speakers of Pittsburgh English pronounce the /aw/ vowel (the vowel sound in words like out and downtown) more like “ah” (stereotyped as “aht” and “dahntahn”). This pronunciation is noticeable to Pittsburghers as a marker of Pittsburgh identity and social meanings like working-class status (e.g., Johnstone et al. 2006).

Let’s say we wanted to investigate how different speakers pronounce /aw/ in different situations. A pretty typical data task would be to identify all tokens (individual instances in speech) matching a specific linguistic context (for example, when /aw/ is followed by a consonant in the same syllable) and extract a set of acoustic measurements (for example, F1 and F2 at 3 timepoints). Normally, performing this sort of batch acoustic measurement on a dataset this big would take hours of manual effort, even if you use state-of-the-art speech technologies for automatic speech recognition and segmental alignment.

With APLS, it takes as little as 2 minutes to measure all 4407 tokens of /aw/ in closed syllables in the corpus.

Show me how!

Search for tokens

Using regular expressions to search across multiple annotation layers, we find 4407 time-aligned /aw/ tokens

Export search results to a CSV file

We get a search-results file with one token per row, and columns for different annotation layers

Extract acoustic measurements for search results using APLS’s built-in Praat module

Our search-results file gets updated with acoustic measurements that we specify (in this case, F1 and F2 at the vowel’s 20%, 50%, and 80% timepoints)

APLS Documentation

Demo: Measuring F1 and F2 for /aw/ in closed syllables

Ready to get started with APLS? Click here!