APLS Documentation
Welcome to the documentation homepage for the Archive of Pittsburgh Language and Speech (aka APLS, pronounced like apples)! APLS is a linguistic data resource, powered by the open-source linguistic corpus software LaBB-CAT, that contains:
- recordings of interviews conducted with speakers native to Pittsburgh and surrounding neighborhoods,
- annotated transcripts with information at the phrase, word, and individual speech sound levels, allowing these recordings to be used as structured linguistic data, and
- metadata on interviewees and transcripts that facilitate large-scale (socio)linguistic analysis.
APLS is (and will always be) free to use. APLS contains 274 sound files totaling over 45 hours of audio from 40 interviewees.
Demo: Measuring F1 and F2 for /aw/ in closed syllables
Some speakers of Pittsburgh English pronounce the /aw/ vowel (the vowel sound in words like out and downtown) more like “ah” (stereotyped as “aht” and “dahntahn”). This pronunciation is noticeable to Pittsburghers as a marker of Pittsburgh identity and social meanings like working-class status (e.g., Johnstone et al. 2006).
Let’s say we wanted to investigate how different speakers pronounce /aw/ in different situations. A pretty typical data task would be to identify all tokens (individual instances in speech) matching a specific linguistic context (for example, when /aw/ is followed by a consonant in the same syllable) and extract a set of acoustic measurements (for example, F1 and F2 at 3 timepoints). Normally, performing this sort of batch acoustic measurement on a dataset this big would take hours of manual effort, even if you use state-of-the-art speech technologies for automatic speech recognition and segmental alignment.
With APLS, it takes as little as 2 minutes to measure all 5202 tokens of /aw/ in closed syllables in the corpus.
Show me how!
- Search for tokens
- Using regular expressions to search across multiple annotation layers, we find 5202 time-aligned /aw/ tokens
- Export search results to a CSV file
- We get a search-results file with one token per row, and columns for different annotation layers
- Extract acoustic measurements for search results using APLS’s built-in Praat module
- Our search-results file gets updated with acoustic measurements that we specify (in this case, F1 and F2 at the vowel’s 20%, 50%, and 80% timepoints)