Transcription

Before audio data can be uploaded to APLS, it must first be transcribed by human transcribers, according to APLS’s transcription convention. Transcriptions consist of annotations synced to turns ranging in duration from 0 to 10 seconds, typically in the .eaf (Elan) or .TextGrid (Praat) file formats. These turns exist on several tiers: one for each speaker (typically just interviewee and interviewer), plus special tiers representing non-speaker noises, transcriber comments, and stretches of speech that should be redacted.

In order to facilitate large-scale processing of speech data through the LaBB-CAT corpus analysis tool that powers APLS, annotations on speaker tiers are mostly orthographic. For example, whereas some transcription conventions require explicit annotations of details such as speech rate or pauses, LaBB-CAT can be programmed to automatically annotate these details. To the extent possible, APLS transcribers attempt to separate the act of transcription from that of coding (socio)linguistic variation. For example, the English -ing ending is transcribed as ing, regardless of whether it is pronounced [ɪŋ] or [ɪn]. The avoidance of “coding while transcribing” not only makes transcription faster, it more importantly facilitates specifying the search context for linguistic patterns once the transcription is uploaded to APLS (e.g., when searching for -ing tokens, the end-user only has to look for /ɪŋ/, rather than both [ɪŋ] and [ɪn]).

In instances where transcribers do need to specify phonemic representations (e.g., novel words or hesitations), they do so using the DISC phonemic alphabet. APLS’s DISC specification is based on the original DISC spec from CELEX (see English lexicon user’s guide, pp. 31–32). Both specs use ASCII characters exclusively while maintaining a one-to-one mapping between symbols and English phonemes. APLS uses DISC to represent phonological layers internally, and end-users use DISC to perform searches on phonological layers.

Most transcriptions were created by undergraduate research assistants at Pitt who had undergone several weeks of training, including feedback and corrections. Some transcriptions were initially created by undergraduate students at Pitt and Swarthmore College as part of class assignments, then extensively hand-checked and corrected by trained research assistants. In some cases, transcribers used AI tools to assist with different transcription tasks: CLOx or Batchalign for speech annotation, pyannote for segmentation; transcribers hand-checked and corrected any AI predictions. As of 2024, transcribers were primarily using a combination of Batchalign annotation filled into pyannote segmentation via a purpose-built Shiny app. In rare cases, there existed pre-existing transcriptions that transcribers translated to the APLS convention. Most transcriptions were initially created in Elan, with some created in Praat. Transcriptions were all checked via a purpose-built Shiny app to ensure files were well-formatted.