Transcription convention
The purpose of transcription for APLS is to facilitate large-scale processing of speech data through the LaBB-CAT corpus analysis tool. This means that we need to report speech as faithfully as possible and be consistent about little details like file names. It also means that we donât transcribe or notate things that LaBB-CAT can do automatically.
Once youâve set up the transcription file, transcription consists of two tasks: segmenting the file into turns, and annotating the turns. In other words, you first figure out who is speaking when, then you figure what they said.
In this document, fixed-width font
is used for things you actually type into the transcription program (Praat or Elan), and italics are used for menu commands (i.e., buttons you click).
On this page
General tips
- Itâs strongly recommended to segment the sound file into turns first, then go back and fill in the transcription
- Some transcribers prefer to segment the entire file before annotating, and others prefer to segment and annotate in chunks
- Transcription usually takes longer at the start of the interview, then it speeds up once you get used to how a speaker talks
- The majority of the sound file will be relatively easy to transcribe. However, some parts of each file will take disproportionately long to transcribe due to unfinished words, overlaps, and/or ambiguous speech
- One recommendation is to create a temporary
Recheck
tier where you make note of speech youâre having trouble hearing correctly, so you can return to these portions of the transcript with fresh ears after youâre done the first pass. Make sure to delete theRecheck
tier once youâre done checking
- One recommendation is to create a temporary
- Use the same audio setup each time you transcribe. Some details that are easy to catch with headphones on are inaudible from laptop speakers.
- Save your work often!
- PraatElan doesnât auto-save your work, unlike some programs you might be used to (e.g., Google Docs). Praat usually doesnât crash, but better safe than sorry.
- Elan is known to crash occasionally, so you may want to set an automatic backup interval (File > Automatic Backup)
File setup
- The transcription file should have the same name as the sound file and end in
.TextGrid
.eaf
(e.g.,CB20interview3.TextGrid
CB20interview3.eaf
) - Create one tier for each speaker, plus fourthree additional tiers:
Noise
,Comment
, andRedaction
, andTranscriber
- The tier name for the main speaker(s) should be that speakerâs APLS code (e.g.,
LV01
). Main speaker(s) are in the sound file name. - The tier name for the interviewer(s) should be the interviewerâs name:
Interviewer HD
for HD interviewsBarbara Johnstone
for most CB/FH/LV interviewsJennifer Andrus
for CB02 or CB18
- In most cases, any additional speakers should be named
Bystander
+ main speakerâs APLS code + a number (e.g.,Bystander CB01 1
,Bystander CB01 2
)- The only exception is if the additional speaker is also in APLS (in which case, name their tier with their speaker code). This is very unlikely, so unless you happen to know the additional speaker is in APLS, just assume itâs a
Bystander
- The only exception is if the additional speaker is also in APLS (in which case, name their tier with their speaker code). This is very unlikely, so unless you happen to know the additional speaker is in APLS, just assume itâs a
- Pay attention to capitalization, plurals, and leading zeros (e.g.,
Redaction
notredactions
,FH01
notFH1
) - The
Transcriber
tier should have a single annotation: the names of all transcribers (including anyone who checked the transcription)
- The tier name for the main speaker(s) should be that speakerâs APLS code (e.g.,
Recheck
tier (see above) while youâre transcribing - Make sure to delete this tier when youâre ready to submit the file
Segmentation
Before annotating speech, you should segment the file into turns by creating intervals on the appropriate speaker tier(s). (Again, some transcribers prefer to segment the entire file before annotating, and others prefer to segment and annotate in chunks.) You can leave the Noise
, Comment
, and Redaction
tiers empty until youâre actually ready to annotate.
General segmentation tips:
- Most files are in stereo, with the interviewer on the left channel and the interviewee on the right channel (although the interviewee's audio often 'bleeds' to the left channel)
- You can mute one channel at a time (Ctrl+click/Cmd+click on the speaker icon to the right of the waveform)
- If you want to hear a single channel in both ears, do the following:
- Go to the Praat Objects window, select the sound file, click Convert > Extract one channel..., and enter
1
for left (interviewer) or2
for right (interviewee) - Select the new sound file and the TextGrid, and click View & Edit
- The new window will keep in sync time-wise with the original one, and any edits you make to the TextGrid will show up on both windows
- Go to the Praat Objects window, select the sound file, click Convert > Extract one channel..., and enter
- Praat doesn't distinguish between empty and non-empty intervals. So when segmenting, add a filler character (e.g.,
>
) into each interval you intend to fill later
- Copying boundaries between tiers is straightforward in Praat:
- If you want to copy an existing boundary to a new tier, click the boundary, then either press Ctrl+Fn/Cmd+Fn (where n is the tier number you want to copy to) or click the blue circle on the tier
- If you want to copy an existing interval (pair of boundaries) to a new tier, click the interval, then press Ctrl+n/Cmd+n (where n is the tier number you want to copy to)
- You can also use these tricks to add a new boundary or interval to multiple tiers: position your cursor by clicking or click-and-dragging on the waveform, then press Ctrl/Cmd+n (interval) or Ctrl/Cmd+Fn (boundary)
- In the course of filling in the transcription, you will sometimes find that you want to adjust the turn segmentation
- Click and drag boundaries to adjust them
- If you want to adjust boundaries on multiple tiers, drag a boundary to the right spot on one tier, copy it to the other tier, delete the old boundary on the other tier, and (if needed) cut & paste the text
- In Elan, segment using Annotation Mode or Segmentation Mode, and transcribe using Transcription Mode
- Most files are in stereo, with the interviewer on the left channel and the interviewee on the right channel (although the interviewee's audio often 'bleeds' to the left channel)
- If you're using earphones, you can remove one to isolate the other channel. Unfortunately, there's no native way to isolate one channel in Elan
- In the course of filling in the transcription, you will sometimes find that you want to adjust the turn segmentation
- This is easy to do in Segmentation Mode. Double-click the tier you want to adjust. Then drag turn boundaries to adjust them, or right-click on the turn to split/merge the turn
- Elan doesn't make it easy to precisely sync boundaries across tiers, so just get it close enough
- Before transcriptions are uploaded to APLS, they are run through a program that 'snaps together' turn boundaries across tiers
Turns of speech
- Segment speakersâ speech into turns based on breath groups, not sentencesâspontaneous speech seldom consists of sentences as we know them from written language!
- Breath groups are stretches of speech in between longer breaths; donât break up turns at every breath
- Turns should be no longer than ~10 seconds, even if thereâs just one speaker for a long stretch of time
- In word list sections, donât give each word its own turn but group words into sets of ~5 within a turn
- Never put a breakpoint in the middle of a word.
Overlaps
We have to handle overlapping speech carefully because of how LaBB-CAT treats overlapping turns: it excludes them from phonetic forced-alignment. That is, any audio transcribed as overlapping canât be searched for individual segments.
- When speakers overlap speech, make the overlapped portion a separate turn on each of the speakersâ tiers
- For example, Speaker A speaks continuously from 4:00 to 4:08 and Speaker B speaks from 4:04 to 4:05 (talking over Speaker A). You should create 3 turns for Speaker A (4:00â4:04, 4:04â4:05, 4:05â4:08) and 1 turn for Speaker B (4:04â4:05)
- Donât break up words, even if the speakers only overlap for one syllable
- Again, Elan doesn't make it easy to precisely sync boundaries across tiers, so just get it close enough
- Boundaries on the
Noise
orComment
tiers donât have to align with speaker tiers - The
Redaction
tier is handled a little differently, as described just below
Redaction
To help protect our speakersâ privacy, we use redaction to keep sensitive personal information out of APLS. When we mark a stretch of speech as redacted, LaBB-CAT deletes the audio and removes the redacted text
- To redact a stretch of speech, create separate turns on two tiers: the speaker who utters the sensitive information, plus the
Redaction
tier - On the speakerâs tier, enter
REDACT
- On the
Redaction
tier, enter a brief description of why youâre redacting the speech (e.g.,speaker name
) - Treat these redactions like overlaps: they should be as short as possible to avoid deleting more information than necessary, and they should be separate from other turns. As usual for overlaps, donât split up words
- If a redaction coincides with overlapped speech (e.g., Speaker A says their name while Speaker B says something else), enter
REDACT
on Speaker Aâs tier, and transcribe Speaker B normally
- On the
- You should redact any information that could uniquely identify the speaker, such as their name (first, last, or maiden name), family membersâ names (including distant relatives), their street address (current or childhood), etc.
- This information comes up quite rarely in sociolinguistic interviews. More likely is non-unique identifying information, which does not need to be redacted: the speakerâs high school, the street they grew up on (but not the street and number), etc.
- If youâre unsure about whether something needs to be redacted, contact Dan
Annotation
Once youâve segmented your transcript into turns, itâs time to annotate. For the most part, APLS transcriptions are orthographic. That is, they consist mostly of the words that speakers say, written with conventional American English spelling. Within that framework, though, there are some important things to consider.
Reminder: In this document, fixed-width font
is used for things you actually type into the transcription program (Elan or Praat).
Spelling
- Careless typos cause difficulties for our parsersâplease be careful!
- Use conventional American English spelling
- If you are unsure of how to spell something, look it up in a dictionary or Google it
- When the speaker refers to a neighborhood or city, the name should always be capitalized even if the individual words are common nouns (e.g.,
Larimer
,the Hill
)
- When the speaker refers to a neighborhood or city, the name should always be capitalized even if the individual words are common nouns (e.g.,
- With the exception of disfluencies or obvious mispronunciations, words should be spelled out in their âdictionary formâ. That means:
- Do not represent typical phonological/sociolinguistic processes like âg droppingâ or consonant cluster deletion in the spelling. For example, [dÊÊmpÉȘĆ] and [dÊÊmpÉȘn] should both be transcribed
jumping
notjumpin'
; [ĂŠnd] and [ĂŠn] should both beand
notan'
. Do not substitute apostrophes for letters (e.g.,old
notol'
). - Do not use âeye dialectâ spellings for common words (
downtown
notdahntahn
;Steelers
notStillers
).- You can use a comment to mention speakersâ performative use of âdialect pronunciationsâ, but this is strictly optional.
- Using non-dictionary forms makes it harder to find words in search results and prevents the corpus software from using computational methods (e.g., part-of-speech lookup)
- Do not represent typical phonological/sociolinguistic processes like âg droppingâ or consonant cluster deletion in the spelling. For example, [dÊÊmpÉȘĆ] and [dÊÊmpÉȘn] should both be transcribed
- Donât tidy up the speech. Leave in the repetitions, fillers, speech errors, bad words, mean sentiments, etc.
- Donât use capital letters for the start of new sentences. Only use capital letters for proper nouns and the pronoun I
- Write all numbers out in full, with spaces instead of hyphens (e.g.,
one hundred and twenty three
not123
orone hundred and twenty-three
) - When abbreviations or acronyms are used:
- If each letter is said separately, use capital letters with spaces in between each letter (e.g.,
P G H
) - If the letter is pluralized, add
s
plus a âpronounceâ code to the last letter (e.g.,D V Ds[diz]
) - If the word is pronounced as a word, use capitals with no spaces (e.g.,
FEMA
)
- If each letter is said separately, use capital letters with spaces in between each letter (e.g.,
- Donât use any diacritics that are not part of the English alphabet (e.g.,
fiancee
notfiancée
) - A single word should always be spelled as an entire word, even if there is a pause between syllables. Never put a breakpoint in the middle of a word.
- The following list represents all and only colloquial spellings that may be used in transcription:
gonna
sorta
'cause
(from because)kinda
gotta
I'mma
wanna
tryna
lotta
'til
(from until)'nother
(as in a whole ânother)
- Standard contractions are fine (e.g.,
might've
). The clitics'd
,'ll
,'ve
, and's
can be attached to any noun - For other interjections, select a representation from the list (IPA symbols for clarity):
yup
yeah
okay
(even if [mÌ©keÉȘ])mmm
eh
/eÉȘ/uh huh
nah
mmm hmm
hmm
um
uh
aw
/É/oh
/oÊ/oo
/u/ahh
/É/gee
jeez
whoops
ow
ha
huh
yuck
damn
hey
oof
blah
woohoo
Punctuation
- The only allowable punctuation marks are apostrophes and hyphens (within words) and periods, hyphens, and question marks (between words)
- Correctly use apostrophes as they would be used in standard writing (e.g.,
can't
,it's
,John's
,the Johnsons' house
) - Use within-word hyphens when the hyphenated representation of a word is common in writing (e.g.,
self-esteem
notshort-necked
) - Other than apostrophes, there should always be a space between letters and punctuation (e.g.,
I know it .
notI know it.
) - Donât use commas or periods to indicate clauses and sentences. Instead, use
.
for a shorter prosodic break or-
for a longer break (a âbeatâ or more)- You can use the hyphen to avoid breaking up halting speech into a bunch of short turns
- Do use question marks, especially if the grammatical structure does not indicate a question, but the intonation does.
- Again, there should always be a space between question marks and any other characters.
- Donât use quotation marks around reported speech (e.g.,
then he said well I suppose you could try it
)
Comments and noises
- Non-speech noises should be notated with a brief description
- Noises produced by individuals should be transcribed between square brackets on that individual speakersâ tier (e.g.,
[laughs]
,[sniffs]
) - General noises should appear on the Noise tier without square brackets (e.g.,
loud truck goes by
,interviewer plays drums
) - You donât need to transcribe every single breath or sniff, just the ones that are prominent enough that a computer could confuse them for a speech sound
- Noises produced by individuals should be transcribed between square brackets on that individual speakersâ tier (e.g.,
- Comments can be about individualsâ speech or general comments
- Comments about individualsâ speech or behavior should be placed between curly brackets on the speakerâs tier (e.g.,
{pretends to talk like a man}
) - General comments can be placed on the
Comment
tier without curly brackets (e.g.,long period not transcribed due to microphone failure
orthe two speakers whisper together
) - In the âPittsburgheseâ or âAAEâ section of the interview, if the speaker performs âdialect formsâ, curly brackets can be used to remark upon their performance (e.g.,
you often hear people saying downtown {performs monophthong aw}
).- Remember: donât use âeye dialectâ spellings, even if the speaker is clearly performing an accent other than their own.
- Comments about individualsâ speech or behavior should be placed between curly brackets on the speakerâs tier (e.g.,
- If a speaker laughs while speaking a single word, use the notation
{mid-word laugh}
on their tier. If a speaker laughs while speaking multiple words, usemid-word laugh
on theComment
tier - If you canât decipher what someone says, insert
[unclear]
on their tier- Try to minimize the use of
[unclear]
- It can help to re-listen to earlier sections of the interview once youâve heard more of the speaker
- Try to minimize the use of
Pronounce codes
- There are 3 situations where you need to attach a âpronounce codeâ to the end of a word to help LaBB-CAT determine the phonemes that are in the word: (1) words that arenât in the dictionary, (2) idiosyncratic pronunciations, (3) hesitations/incomplete words
- Attach the pronounce code with no space after the word, in square brackets (e.g.,
yinzerific["jIn-z@r-'I-fIk]
) - The pronounce code uses the DISC phonemic alphabet to give a phonemic representation of what was said
- Attach the pronounce code with no space after the word, in square brackets (e.g.,
- If a speaker uses a word that isnât in a standard dictionary or APLSâs custom dictionary (a word made up on the spot, a specific road name, etc.), just spell it the best you can, and provide a pronounce code in DISC (e.g.,
yinzerific["jIn-z@r-'I-fIk]
)- If the word is used more than once in a transcript, the DISC code needs to be supplied every time
- We can add words to APLSâs custom dictionary, but only if itâs a word that is likely to be used by more than just one speaker
- If a speaker pronounces a word in an idiosyncratic way (e.g., bookshelves as buckshelves), provide a DISC code
- Do not use a DISC code if their pronunciation is the result of a typical phonological/sociolinguistic process like pre-/l/ vowel laxing (e.g., fail as [fΔl]) or consonant cluster reduction (e.g., waste as [weÉȘs]). If youâre not sure, err on the side of not using a DISC code
- Remember: donât use âeye dialectâ spellings, even if the speaker is clearly performing an accent other than their own
- You can use a comment to remark upon speakersâ performative use of âdialect pronunciationsâ, but this is strictly optional
- If the idiosyncratic pronunciation is used more than once in a transcript, the DISC code needs to be supplied every time
- Unfinished words use a tilde to mark incompleteness, plus a pronounce code and optional lexical code
- For example, if the speaker starts to say hesitate but stops after two syllables, either of the following two notations would be acceptable:
hesi~['hE-z@]
,hesi~['hE-z@](hesitate)
- Optionally, you can add a lexical code, which uses English spelling to represent the intended word (if known). The lexical code goes in parentheses (
(hesitate)
) with no space after the pronounce code - You can omit pronounce codes for unfinished words that consist of just a consonant, since these are built-in to APLSâs dictionary
- For example, if the speaker just says [f], you can write just
f~
without a pronounce code
- For example, if the speaker just says [f], you can write just
- Words do not need to be marked as unfinished if their pronunciation is the result of a typical phonological/sociolinguistic process like or consonant cluster deletion
- For example, if the speaker says and as [ĂŠn] rather than [ĂŠnd], transcribe it as just
and
, notan~['{n]
- This is related to the principle that words should be spelled out in their âdictionary formâ
- For example, if the speaker says and as [ĂŠn] rather than [ĂŠnd], transcribe it as just
- For example, if the speaker starts to say hesitate but stops after two syllables, either of the following two notations would be acceptable:
Converting between Elan and Praat
Itâs sometimes necessary to convert Praat (.TextGrid
) files to Elan (.eaf
), or vice versa. Both can be accomplished in Elan.
LVC students: You can ignore this section. Youâll submit your transcriptions as
.TextGrid
, so no need to convert.
Praat to Elan
- Import the
.TextGrid
file into Elan- In Elan, go to File > Import > Praat TextGrid File
- Click Browse⊠and find the file
- Select âSkip empty intervals / annotationsâ
- Click Next, then Finish
- If you get an error message âOperation interrupted: No tiers detected in TextGrid fileâ, the issue may be the file encoding. Redo the preceding steps but try a different encoding in the Browse⊠popup window.
- Link the audio with the transcript
- Go to Edit > Linked Files
- Click Add⊠and browse for the file
- Click Apply
- Set attributes
- Set the Author attribute at Edit > Set Author
- Set each tierâs Participant attribute to be the same as the tier name
- Click Tier > Change Tier Attributes
- Select a tier, add the Participant attribute, and click Change
- Repeat for all tiers, then click Close
- Since we no longer need the
Transcriber
tier, delete it- Click Tier > Delete Tier
- Select the
Transcriber
tier, and click Delete
- Save the
.eaf
file
Elan to Praat
- Open the
.eaf
file in Elan - Copy the contents of the Author attribute to a blank document so you can use it later
- Export the file to
.TextGrid
- Go to File > Export As > Praat TextGrid
- Leave all defaults as-is and click OK
- Browse to where you want to save the file, check that the file name is correct, and click Save
- Open the
.TextGrid
file in Praat and add aTranscriber
tier- Select the file in the Objects window and click Modify > Insert interval tierâŠ
- Fill in âPosition:â with
10
and âName:â withTranscriber
- In the Objects window, click View & Edit
- Click on the
Transcriber
tier and paste the contents of the Author attribute you copied earlier
- Save the
.TextGrid
file