Getting started
APLS contains sound files alongside annotations that allow us to treat these audio files as structured linguistic data. Once you have an APLS login, you can access APLS through your browser at https://apls.pitt.edu/labbcat.1
On this page
Where APLS audio data comes from
APLS contains audio recordings of one-on-one sociolinguistic interviews from fieldwork conducted in 2003–2005 with native Pittsburghers in four Pittsburgh neighborhoods: the Hill District (abbreviated HD
in APLS), Lawrenceville (LV
), Forest Hills (FH
), and Cranberry Township (CB
).2 These interviews typically consisted of several sections (though not all interviews had all of these):
- A long conversation
- Sometimes including chatter about “Pittsburghese” and/or African American English
- 1–2 reading tasks
- A minimal pairs task
APLS includes just a subset of the audio files from this fieldwork. All interviewees in APLS are natives of the Pittsburgh area, and all interviewees consented to make their data publicly available. In addition, APLS currently contains only files that had just one interviewee.
Basic organization
APLS data is organized using the data structures provided by the open-source linguistic corpus software LaBB-CAT. The most important organizational units in LaBB-CAT corpora are annotations, transcripts, participants, and layers.
- Annotations are individual bits of data aligned to specific timestamps in audio files.
- Transcripts hold data for a single audio file and all of its annotations, plus metadata like when the audio file was recorded.
- Participants are speakers, plus metadata like demographic info.
- Layers are series of time-aligned annotations in transcripts corresponding to a single type of linguistic data (e.g., pronunciations, part-of-speech tags).
Participants and transcripts
The participants in APLS are the interviewees, the interviewers, and occasionally a bystander whose speech is captured in the recording. Interviewees in APLS are identified by an anonymized speaker code that includes their neighborhood abbrevation (e.g., CB01
, HD17
).
Interviews are divided into several transcripts (corresponding to the original recording files), named after the interviewee and interview section. For example, the file FH10pairs.eaf
contains the minimal pairs task from the interviewee FH10.3 Some interview sections are split into multiple transcripts (e.g., interview1
, reading2
).
Annotations and layers
To illustrate annotations and layers in APLS, below is a screen-grab of a single line of speech from the transcript HD07interview3.eaf
:
Let’s break down what we can see:
- On the left-hand side is
HD07
, the participant who uttered this speech. - To the right of this speaker code are three layers. From bottom to top, these are word, part_of_speech, and speech_rate.
- word layer (bottom):
- This layer contains the words that HD07 spoke, spelled in normal English.
- Each word has a single annotation on the word layer.
- part_of_speech layer (middle):
- This layer encodes each word’s part of speech in symbols developed for the Penn Treebank project (e.g.,
UH
for interjections,CC
for coordinating conjunctions). - Most words have a single part_of_speech annotation. The word don’t has two annotations (
VBP RB
), since consists of both a present-tense verb (do) and an adverb (not).
- This layer encodes each word’s part of speech in symbols developed for the Penn Treebank project (e.g.,
- speech_rate (top)
- This layer contains a measurement (in syllables per second) of how quickly HD07 uttered this line
- Because APLS measures the speech rate over an entire line of the transcript, there is just one speech_rate annotation for this line (as indicated by the curved bracket).
- word layer (bottom):
- The cursor is hovering over the
NN
annotation, bringing up a tooltip with several pieces of information:- The selected annotation is on the part_of_speech layer
- This annotation is part of a line (aka an utterance) that begins at 7.92 seconds into the transcript and lasts around 3.29 seconds
- There’s a menu that can be brought up by clicking on the annotation
From audio data to APLS
To get an audio file into APLS, it is first transcribed by a research assistant according to a specific set of conventions that facilitate analysis in LaBB-CAT. (This takes a ton of time and effort!) The transcription file is then uploaded with its audio file to APLS, where it is converted into an APLS transcript. APLS generates numerous layers for the transcript, based on dictionaries for looking up representations of words (e.g., morphological parses), machine learning models (e.g., the Hidden Markov Toolkit for determining time-alignments of individual speech sounds), and/or other layers. Finally, participant and transcript metadata is uploaded to APLS.
Navigating documentation site
We’ll get into navigating APLS itself once you have a user account. In the meantime, here are some tips for navigating this documentation site.
Special formatting
This site uses special formatting to denote specific types of information:
- Key terms
- Example: annotation
- Layers
- Example: orthography
- External links (i.e., a link that’s not to a documentation page or an APLS page)
- Example: LaBB-CAT
- Input/output text (i.e., something you actually type into APLS, or information that APLS displays like a speaker code)
- Example:
CB01
- Example:
- Things you click on in APLS (e.g., a menu option or a link)
- Example: The transcripts page
Navigation across and within pages
You can browse through pages in the left-hand navigation pane. Most pages have a collapsible table of contents toward the top of the page. All headings in the text of a page have a unique permalink; you can copy this permalink by hovering over the heading and clicking the link icon (/
key to move your cursor to the search bar without clicking), suggest edits to pages on GitHub, and toggle between light mode and dark mode.
Callout boxes
Throughout these pages are collapsible “callout” boxes to help you understand how to use APLS and how it works. Green “Try it!” boxes give you step-by-step instructions on doing some task in APLS. Blue “Under the hood” boxes give some details about technical details, design decisions, and/or the history of APLS’s development. For example:
Even if you don’t have an APLS login yet, you can still load the page in your browser.
Go to https://apls.pitt.edu/labbcat. You should see a login box pop up:
APLS’s original URL was https://labb-cat.linguistics.pitt.edu/labbcat. But the https://apls.pitt.edu/labbcat URL was chosen as an alias because it’s shorter, easier to remember, and less prone to typos.
Sign up
Ready to get started with APLS? Sign up for a user account. We’ll send you a username and temporary password within 1 US business day.
Initial login
Once you have a username and temporary password, you can log in to https://apls.pitt.edu/labbcat:
There are two additional things you’ll only need to do the first time you log in:
-
You’ll see a license (below). Scroll to the bottom and click I Agree.
-
Then you’ll see a prompt to reset your temporary password. Enter your new password and click Change Pass Phrase.
Forget your password?
If you forget your password, fill out the password-reset form. We’ll reset your password within 1 US business day.
-
Advanced users can also access APLS via the
nzilbb.labbcat
package for R, or thenzilbb-labbcat
library for Python. These packages have most of the functionality of the browser-based graphical user interface (https://apls.pitt.edu/labbcat), with some added benefits such as reproducibility (e.g., a particular set of search criteria can be encoded in R/Python code rather than described for copy/paste). Even if you plan to mostly use these interfaces, however, it’s a good idea to learn the browser-based GUI first, as it will help you build an intuitive sense for how APLS data is organized. ↩ -
In keeping with Pittsburgh parlance, we use neighborhood to encompass geographic areas inside or outside Pittsburgh city limits. Technically, only the Hill District and Lawrenceville are within city limits. Forest Hills is a borough within Allegheny County, and Cranberry Township is a township just outside Allegheny County. In the original fieldwork, these sites were chosen to reflect a distinction (between inner-city, inner-ring suburb, and outer-ring suburb) that shows up in some classic sociolinguistic literature (e.g., Bailey et al. 1993, Eckert 2000)Â ↩
-
The
.eaf
part of the transcript name reflects the original transcript file, which was created in the transcription program Elan. ↩