Field guide: Transcript attributes

This page contains information on transcript attributes. For more information about how attributes work, and what the columns in the following tables mean, read the attribute typology.

On this page

corpus
episode
transcript
duration
language
left_channel_participants
neighborhood
notes
recording_date
right_channel_participants
single_channel_participants
transcribers
transcription_ai_tools
type
version_date

corpus

Collection of transcripts from a single research project

Display title	Export name	Filterable	Multiple values
Corpus	`corpus`	False	False

Option codes and descriptions

Code	Description
`pgh0307`	Sociolinguistic interviews conducted between 2003 and 2007 in four Pittsburgh-area neighborhoods: Cranberry Township, Forest Hills, the Hill District, and Lawrenceville

episode

Series of transcripts from a single sociolinguistic interview

Display title	Export name	Filterable	Multiple values
Episode	`episode`	True	False

transcript

Transcript file name

Display title	Export name	Filterable	Multiple values
Transcript	`transcript`	True	False

duration

Transcript duration in seconds

Display title	Export name	Filterable	Multiple values
Duration (sec)	`transcript_duration`	True	False

language

Language spoken (placeholder)

Display title	Export name	Filterable	Multiple values
Language	`transcript_language`	False	False

Option codes and descriptions

Code	Description
`en`	English

left_channel_participants

Participant(s) whose audio is primarily on the left channel (though potentially with ‘bleed’ to the right channel) in the corresponding audio file.

Display title	Export name	Filterable	Multiple values
Left-channel participant(s)	`transcript_left_channel_participants`	False	False

neighborhood

Which neighborhood main speaker was recruited from (note that Pittsburghers often refer to municipalities near Pittsburgh as “neighborhoods”)

Display title	Export name	Filterable	Multiple values
Neighborhood	`transcript_neighborhood`	True	False

Option codes

Cranberry Township
Forest Hills
Hill District
Lawrenceville

notes

General notes that help contextualize the transcript

Display title	Export name	Filterable	Multiple values
Notes	`transcript_notes`	False	False

recording_date

Date of recording

Display title	Export name	Filterable	Multiple values
Recording date	`transcript_recording_date`	False	False

right_channel_participants

Participant(s) whose audio is primarily on the right channel (though potentially with ‘bleed’ to the left channel) in the corresponding audio file.

Display title	Export name	Filterable	Multiple values
Right-channel participant(s)	`transcript_right_channel_participants`	False	False

single_channel_participants

Participant(s) whose audio appears on the only channel in the corresponding audio file (i.e., for files that are in mono rather than stereo).

Display title	Export name	Filterable	Multiple values
Single-channel participant(s)	`transcript_single_channel_participants`	False	False

transcribers

Name of transcriber(s) who completed the original orthographic transcription

Display title	Export name	Filterable	Multiple values
Transcribers	`transcript_transcribers`	False	False

transcription_ai_tools

AI tool(s) used to assist human transcription

Display title	Export name	Filterable	Multiple values
Transcription AI tool(s)	`transcript_transcription_ai_tools`	False	True

Option codes and descriptions

Code	Description
`Batchalign segmentation`	Turn segmentation using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model.
`Batchalign transcription`	Orthographic transcription using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model, with some post-processing
`CLOx transcription`	Orthographic transcription using CLOx (https://clox.ling.washington.edu/#/)
`Pyannote segmentation`	Turn segmentation using pyannote (https://github.com/pyannote/pyannote-audio), with some post-processing

For more context on how transcriptions were created, see here.

type

Sociolinguistic interview section

Display title	Export name	Filterable	Multiple values
Transcript type	`transcript_type`	True	False

Option codes

interview
metalinguistic
pairs
reading

version_date

Date the transcript was last edited

Display title	Export name	Filterable	Multiple values
Version date	`transcript_version_date`	False	False