APLS Documentation
Welcome to the documentation homepage for the Archive of Pittsburgh Language and Speech (aka APLS, pronounced like apples)! APLS is a linguistic data resource, powered by the open-source linguistic corpus software LaBB-CAT, that contains:
- recordings of interviews conducted with speakers native to Pittsburgh and surrounding neighborhoods,
- annotated transcripts with information at the phrase, word, and individual speech sound levels, allowing these recordings to be used as structured linguistic data, and
- metadata on interviewees and transcripts that facilitate large-scale (socio)linguistic analysis.
APLS is (and will always be) free to use. APLS currently contains 218 sound files totaling 34 hours of audio from 34 interviewees. APLS is currently under construction; when complete, it will contain 270 sound files totaling 45 hours of audio from 40 interviewees.
Ready to get started with APLS? Click here.
Demo: Measuring F1 and F2 for /aw/ in closed syllables
Some speakers of Pittsburgh English pronounce the /aw/ vowel (the vowel sound in words like out and downtown) more like “ah” (stereotyped as “aht” and “dahntahn”). This pronunciation is noticeable to Pittsburghers as a marker of Pittsburgh identity and social meanings like working-class status (e.g., Johnstone et al. 2006).
Let’s say we wanted to investigate how different speakers pronounce /aw/ in different situations. A pretty typical data task would be to identify all tokens (individual instances in speech) matching a specific linguistic context (for example, when /aw/ is followed by a consonant in the same syllable) and extract a set of acoustic measurements (for example, F1 and F2 at 3 timepoints). Normally, performing this sort of batch acoustic measurement on a dataset this big would take hours of manual effort, even if you use state-of-the-art speech technologies for automatic speech recognition and segmental alignment.
With APLS, it takes as little as 2 minutes to measure all 4543 tokens of /aw/ in closed syllables in the corpus.
Show me how!
- Search for tokens
- Using regular expressions to search across multiple annotation layers, we find 4543 time-aligned /aw/ tokens
- Export search results to a CSV file
- We get a search-results file with one token per row, and columns for different annotation layers
- Extract acoustic measurements for search results using APLS’s built-in Praat module
- Our search-results file gets updated with acoustic measurements that we specify (in this case, F1 and F2 at the vowel’s 20%, 50%, and 80% timepoints)