Transcription convention

Note: This page is a how-to for transcribers. If you want a higher-level overview for end-users, read the transcription page.

The purpose of transcription for APLS is to facilitate large-scale processing of speech data through the LaBB-CAT corpus analysis tool. This means that we need to report speech as faithfully as possible and be consistent about little details like file names. It also means that we don’t transcribe or notate things that LaBB-CAT can do automatically.

Once you’ve set up the transcription file, transcription consists of two tasks: segmenting the file into turns, and annotating the turns. In other words, you first figure out who is speaking when, then you figure what they said.

In this document, fixed-width font is used for things you actually type into the transcription program (Praat or Elan), and italics are used for menu commands (i.e., buttons you click).

Select your transcription program: Praat Elan

On this page

General tips
File setup
Segmentation
Annotation
Converting between Elan and Praat
1. Praat to Elan
2. Elan to Praat

General tips

It’s strongly recommended to segment the sound file into turns first, then go back and fill in the transcription
- Some transcribers prefer to segment the entire file before annotating, and others prefer to segment and annotate in chunks
Transcription usually takes longer at the start of the interview, then it speeds up once you get used to how a speaker talks
The majority of the sound file will be relatively easy to transcribe. However, some parts of each file will take disproportionately long to transcribe due to unfinished words, overlaps, and/or ambiguous speech
- One recommendation is to create a temporary Recheck tier where you make note of speech you’re having trouble hearing correctly, so you can return to these portions of the transcript with fresh ears after you’re done the first pass. Make sure to delete the Recheck tier once you’re done checking
Use the same audio setup each time you transcribe. Some details that are easy to catch with headphones on are inaudible from laptop speakers.
Save your work often!
- PraatElan doesn’t auto-save your work, unlike some programs you might be used to (e.g., Google Docs). Praat usually doesn’t crash, but better safe than sorry.
- Elan is known to crash occasionally, so you may want to set an automatic backup interval (File > Automatic Backup)
Once you’ve finished transcribing, check over your work!

File setup

The transcription file should have the same name as the sound file and end in .TextGrid.eaf (e.g., CB20interview3.TextGridCB20interview3.eaf)
Create one tier for each speaker, plus fourthree additional tiers: Noise, Comment, and Redaction, and Transcriber
- The tier name for the main speaker(s) should be that speaker’s APLS code (e.g., LV01). Main speaker(s) are in the sound file name.
- The tier name for the interviewer(s) should be the interviewer’s name:
  - Interviewer HD for HD interviews
  - Barbara Johnstone for most CB/FH/LV interviews
  - Jennifer Andrus for CB02 or CB18
- In most cases, any additional speakers should be named Bystander + main speaker’s APLS code + a number (e.g., Bystander CB01 1, Bystander CB01 2)
  - The only exception is if the additional speaker is also in APLS (in which case, name their tier with their speaker code). This is very unlikely, so unless you happen to know the additional speaker is in APLS, just assume it’s a Bystander
- Pay attention to capitalization, plurals, and leading zeros (e.g., Redaction not redactions, FH01 not FH1)
- The Transcriber tier should have a single annotation: the names of all transcribers (including anyone who checked the transcription)
For all tiers, set the Participant attribute to be the same as the tier name. To add tiers and set tier attributes, go to Tier > Add New Tier.
Set the file's Author attribute to the names of all transcribers (including anyone who checked the transcription) by going to Edit > Set Author
You may also want to create a (temporary) Recheck tier (see above) while you’re transcribing
- Make sure to delete this tier when you’re ready to submit the file

Segmentation

Before annotating speech, you should segment the file into turns by creating intervals on the appropriate speaker tier(s). (Again, some transcribers prefer to segment the entire file before annotating, and others prefer to segment and annotate in chunks.) You can leave the Noise, Comment, and Redaction tiers empty until you’re actually ready to annotate.

General segmentation tips:

Most files are in stereo, with the interviewer on the left channel and the interviewee on the right channel (although the interviewee's audio often 'bleeds' to the left channel)
- You can mute one channel at a time (Ctrl+click/Cmd+click on the speaker icon to the right of the waveform)
- If you want to hear a single channel in both ears, do the following:
  - Go to the Praat Objects window, select the sound file, click Convert > Extract one channel..., and enter 1 for left (interviewer) or 2 for right (interviewee)
  - Select the new sound file and the TextGrid, and click View & Edit
  - The new window will keep in sync time-wise with the original one, and any edits you make to the TextGrid will show up on both windows
- Praat doesn't distinguish between empty and non-empty intervals. So when segmenting, add a filler character (e.g., >) into each interval you intend to fill later
Copying boundaries between tiers is straightforward in Praat:
- If you want to copy an existing boundary to a new tier, click the boundary, then either press Ctrl+Fn/Cmd+Fn (where n is the tier number you want to copy to) or click the blue circle on the tier
- If you want to copy an existing interval (pair of boundaries) to a new tier, click the interval, then press Ctrl+n/Cmd+n (where n is the tier number you want to copy to)
- You can also use these tricks to add a new boundary or interval to multiple tiers: position your cursor by clicking or click-and-dragging on the waveform, then press Ctrl/Cmd+n (interval) or Ctrl/Cmd+Fn (boundary)
In the course of filling in the transcription, you will sometimes find that you want to adjust the turn segmentation
- Click and drag boundaries to adjust them
- If you want to adjust boundaries on multiple tiers, drag a boundary to the right spot on one tier, copy it to the other tier, delete the old boundary on the other tier, and (if needed) cut & paste the text

In Elan, segment using Annotation Mode or Segmentation Mode, and transcribe using Transcription Mode
Most files are in stereo, with the interviewer on the left channel and the interviewee on the right channel (although the interviewee's audio often 'bleeds' to the left channel)
- If you're using earphones, you can remove one to isolate the other channel. Unfortunately, there's no native way to isolate one channel in Elan
In the course of filling in the transcription, you will sometimes find that you want to adjust the turn segmentation
- This is easy to do in Segmentation Mode. Double-click the tier you want to adjust. Then drag turn boundaries to adjust them, or right-click on the turn to split/merge the turn
Elan doesn't make it easy to precisely sync boundaries across tiers, so just get it close enough
- Before transcriptions are uploaded to APLS, they are run through a program that 'snaps together' turn boundaries across tiers

Turns of speech

Segment speakers’ speech into turns based on breath groups, not sentences—spontaneous speech seldom consists of sentences as we know them from written language!
- Breath groups are stretches of speech in between longer breaths; don’t break up turns at every breath
Turns should be no longer than ~10 seconds, even if there’s just one speaker for a long stretch of time
In word list sections, don’t give each word its own turn but group words into sets of ~5 within a turn
Never put a breakpoint in the middle of a word.

Overlaps

We have to handle overlapping speech carefully because of how LaBB-CAT treats overlapping turns: it excludes them from phonetic forced-alignment. That is, any audio transcribed as overlapping can’t be searched for individual segments.

When speakers overlap speech, make the overlapped portion a separate turn on each of the speakers’ tiers
- For example, Speaker A speaks continuously from 4:00 to 4:08 and Speaker B speaks from 4:04 to 4:05 (talking over Speaker A). You should create 3 turns for Speaker A (4:00–4:04, 4:04–4:05, 4:05–4:08) and 1 turn for Speaker B (4:04–4:05)
- Don’t break up words, even if the speakers only overlap for one syllable
- Again, Elan doesn't make it easy to precisely sync boundaries across tiers, so just get it close enough
This only applies to when speakers overlap speech
- Boundaries on the Noise or Comment tiers don’t have to align with speaker tiers
- The Redaction tier is handled a little differently, as described just below

Redaction

To help protect our speakers’ privacy, we use redaction to keep sensitive personal information out of APLS. When we mark a stretch of speech as redacted, LaBB-CAT deletes the audio and removes the redacted text

To redact a stretch of speech, create separate turns on two tiers: the speaker who utters the sensitive information, plus the Redaction tier
On the speaker’s tier, enter REDACT
- On the Redaction tier, enter a brief description of why you’re redacting the speech (e.g., speaker name)
- Treat these redactions like overlaps: they should be as short as possible to avoid deleting more information than necessary, and they should be separate from other turns. As usual for overlaps, don’t split up words
- If a redaction coincides with overlapped speech (e.g., Speaker A says their name while Speaker B says something else), enter REDACT on Speaker A’s tier, and transcribe Speaker B normally
You should redact any information that could uniquely identify the speaker, such as their name (first, last, or maiden name), family members’ names (including distant relatives), their street address (current or childhood), etc.
- This information comes up quite rarely in sociolinguistic interviews. More likely is non-unique identifying information, which does not need to be redacted: the speaker’s high school, the street they grew up on (but not the street and number), etc.
- If you’re unsure about whether something needs to be redacted, contact Dan
In addition, redact any instances of the N-word.
- Enter REDACT on the speaker tier as usual, and N-word on the Redaction tier

Annotation

Once you’ve segmented your transcript into turns, it’s time to annotate. For the most part, APLS transcriptions are orthographic. That is, they consist mostly of the words that speakers say, written with conventional American English spelling. Within that framework, though, there are some important things to consider.

Reminder: In this document, fixed-width font is used for things you actually type into the transcription program (Elan or Praat).

Spelling

Careless typos cause difficulties for our parsers—please be careful!
Use conventional American English spelling
If you are unsure of how to spell something, look it up in a dictionary or Google it
- When the speaker refers to a neighborhood or city, the name should always be capitalized even if the individual words are common nouns (e.g., Larimer, the Hill)
With the exception of disfluencies or obvious mispronunciations, words should be spelled out in their “dictionary form”. That means:
- Do not represent typical phonological/sociolinguistic processes like “g dropping” or consonant cluster deletion in the spelling. For example, [dʒʌmpɪŋ] and [dʒʌmpɪn] should both be transcribed jumping not jumpin'; [ænd] and [æn] should both be and not an'. Do not substitute apostrophes for letters (e.g., old not ol').
- Do not use “eye dialect” spellings for common words (downtown not dahntahn; Steelers not Stillers).
  - You can use a comment to mention speakers’ performative use of “dialect pronunciations”, but this is strictly optional.
- Using non-dictionary forms makes it harder to find words in search results and prevents the corpus software from using computational methods (e.g., part-of-speech lookup)
Don’t tidy up the speech. Leave in the repetitions, fillers, speech errors, bad words, mean sentiments, etc.
- The exception is the N-word, which should always be redacted (see above)
Don’t use capital letters for the start of new sentences. Only use capital letters for proper nouns and the pronoun I
Write all numbers out in full, with spaces instead of hyphens (e.g., one hundred and twenty three not 123 or one hundred and twenty-three)
When abbreviations or acronyms are used:
- If each letter is said separately, use capital letters with spaces in between each letter (e.g., P G H)
- If the letter is pluralized, add s plus a “pronounce” code to the last letter (e.g., D V Ds[diz])
- If the word is pronounced as a word, use capitals with no spaces (e.g., FEMA)
Don’t use any diacritics that are not part of the English alphabet (e.g., fiancee not fiancée)
A single word should always be spelled as an entire word, even if there is a pause between syllables. Never put a breakpoint in the middle of a word.
The following list represents all and only colloquial spellings that may be used in transcription:
- gonna
- sorta
- 'cause (from because)
- kinda
- gotta
- I'mma
- wanna
- tryna
- lotta
- 'til (from until)
- 'nother (as in a whole ‘nother)
Standard contractions are fine (e.g., might've). The clitics 'd, 'll, 've, and 's can be attached to any noun
For other interjections, select a representation from the list (IPA symbols for clarity):
- yup
- yeah
- okay (even if [m̩keɪ])
- mmm
- eh /eɪ/
- uh huh
- nah
- mmm hmm
- hmm
- um
- uh
- aw /ɔ/
- oh /oʊ/
- oo /u/
- ahh /ɑ/
- gee
- jeez
- whoops
- ow
- ha
- huh
- yuck
- damn
- hey
- oof
- blah
- woohoo

Punctuation

The only allowable punctuation marks are apostrophes and hyphens (within words) and periods, hyphens, and question marks (between words)
Correctly use apostrophes as they would be used in standard writing (e.g., can't, it's, John's, the Johnsons' house)
Use within-word hyphens when the hyphenated representation of a word is common in writing (e.g., self-esteem not short-necked)
Other than apostrophes, there should always be a space between letters and punctuation (e.g., I know it . not I know it.)
Don’t use commas or periods to indicate clauses and sentences. Instead, use . for a shorter prosodic break or - for a longer break (a ‘beat’ or more)
- You can use the hyphen to avoid breaking up halting speech into a bunch of short turns
Do use question marks, especially if the grammatical structure does not indicate a question, but the intonation does.
- Again, there should always be a space between question marks and any other characters.
Don’t use quotation marks around reported speech (e.g., then he said well I suppose you could try it)

Comments and noises

Non-speech noises should be notated with a brief description
- Noises produced by individuals should be transcribed between square brackets on that individual speakers’ tier (e.g., [laughs], [sniffs])
- General noises should appear on the Noise tier without square brackets (e.g., loud truck goes by, interviewer plays drums)
- You don’t need to transcribe every single breath or sniff, just the ones that are prominent enough that a computer could confuse them for a speech sound
Comments can be about individuals’ speech or general comments
- Comments about individuals’ speech or behavior should be placed between curly brackets on the speaker’s tier (e.g., {pretends to talk like a man})
- General comments can be placed on the Comment tier without curly brackets (e.g., long period not transcribed due to microphone failure or the two speakers whisper together)
- In the “Pittsburghese” or “AAE” section of the interview, if the speaker performs “dialect forms”, curly brackets can be used to remark upon their performance (e.g., you often hear people saying downtown {performs monophthong aw}).
  - Remember: don’t use “eye dialect” spellings, even if the speaker is clearly performing an accent other than their own.
If a speaker laughs while speaking a single word, use the notation {mid-word laugh} on their tier. If a speaker laughs while speaking multiple words, use mid-word laugh on the Comment tier
If you can’t decipher what someone says, insert [unclear] on their tier
- Try to minimize the use of [unclear]
- It can help to re-listen to earlier sections of the interview once you’ve heard more of the speaker

Pronounce codes

There are 3 situations where you need to attach a “pronounce code” to the end of a word to help LaBB-CAT determine the phonemes that are in the word: (1) words that aren’t in the dictionary, (2) idiosyncratic pronunciations, (3) hesitations/incomplete words
- Attach the pronounce code with no space after the word, in square brackets (e.g., yinzerific["jIn-z@r-'I-fIk])
- The pronounce code uses the DISC phonemic alphabet to give a phonemic representation of what was said
If a speaker uses a word that isn’t in a standard dictionary or APLS’s custom dictionary (a word made up on the spot, a specific road name, etc.), just spell it the best you can, and provide a pronounce code in DISC (e.g., yinzerific["jIn-z@r-'I-fIk])
- If the word is used more than once in a transcript, the DISC code needs to be supplied every time
- We can add words to APLS’s custom dictionary, but only if it’s a word that is likely to be used by more than just one speaker
If a speaker pronounces a word in an idiosyncratic way (e.g., bookshelves as buckshelves), provide a DISC code
- Do not use a DISC code if their pronunciation is the result of a typical phonological/sociolinguistic process like pre-/l/ vowel laxing (e.g., fail as [fεl]) or consonant cluster reduction (e.g., waste as [weɪs]). If you’re not sure, err on the side of not using a DISC code
- Remember: don’t use “eye dialect” spellings, even if the speaker is clearly performing an accent other than their own
  - You can use a comment to remark upon speakers’ performative use of “dialect pronunciations”, but this is strictly optional
- If the idiosyncratic pronunciation is used more than once in a transcript, the DISC code needs to be supplied every time
Unfinished words use a tilde to mark incompleteness, plus a pronounce code and optional lexical code
- For example, if the speaker starts to say hesitate but stops after two syllables, either of the following two notations would be acceptable: hesi~['hE-z@], hesi~['hE-z@](hesitate)
- Optionally, you can add a lexical code, which uses English spelling to represent the intended word (if known). The lexical code goes in parentheses ((hesitate)) with no space after the pronounce code
- You can omit pronounce codes for unfinished words that consist of just a consonant, since these are built-in to APLS’s dictionary
  - For example, if the speaker just says [f], you can write just f~ without a pronounce code
- Words do not need to be marked as unfinished if their pronunciation is the result of a typical phonological/sociolinguistic process like or consonant cluster deletion
  - For example, if the speaker says and as [æn] rather than [ænd], transcribe it as just and, not an~['{n]
  - This is related to the principle that words should be spelled out in their “dictionary form”

Converting between Elan and Praat

It’s sometimes necessary to convert Praat (.TextGrid) files to Elan (.eaf), or vice versa. Both can be accomplished in Elan.

Praat to Elan

Import the .TextGrid file into Elan
1. In Elan, go to File > Import > Praat TextGrid File
2. Click Browse… and find the file
3. Select “Skip empty intervals / annotations”
4. Click Next, then Finish
5. If you get an error message “Operation interrupted: No tiers detected in TextGrid file”, the issue may be the file encoding. Redo the preceding steps but try a different encoding in the Browse… popup window.
Link the audio with the transcript
1. Go to Edit > Linked Files
2. Click Add… and browse for the file
3. Click Apply
Set attributes
1. Set the Author attribute at Edit > Set Author
2. Set each tier’s Participant attribute to be the same as the tier name
  1. Click Tier > Change Tier Attributes
  2. Select a tier, add the Participant attribute, and click Change
  3. Repeat for all tiers, then click Close
3. Since we no longer need the Transcriber tier, delete it
  1. Click Tier > Delete Tier
  2. Select the Transcriber tier, and click Delete
Save the .eaf file

Elan to Praat

Open the .eaf file in Elan
If you’ve already completed your transcription in Elan, copy the contents of the Author attribute to a blank document so you can use it later.
- If you’re just converting a file from the [Beautiful Monster][fill-batchalign-words], don’t worry about this step.
Export the file to .TextGrid
1. Go to File > Export As > Praat TextGrid
2. Leave all defaults as-is and click OK
3. Browse to where you want to save the file, check that the file name is correct, and click Save
Open the .TextGrid file in Praat and add a Transcriber tier
1. Select the file in the Objects window and click Modify > Insert interval tier…
2. Fill in “Position:” with 10 and “Name:” with Transcriber
3. In the Objects window, click View & Edit
4. Click on the Transcriber tier and paste the contents of the Author attribute you copied earlier
Save the .TextGrid file