Field guide: Transcript attributes
This page contains information on transcript attributes. For more information about how attributes work, and what the columns in the following tables mean, read the attribute typology.
On this page
corpus
Collection of transcripts from a single research project
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Corpus | corpus | False | False |
Option codes and descriptions
Code | Description |
---|---|
pgh0307 | Sociolinguistic interviews conducted between 2003 and 2007 in four Pittsburgh-area neighborhoods: Cranberry Township, Forest Hills, the Hill District, and Lawrenceville |
episode
Series of transcripts from a single sociolinguistic interview
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Episode | episode | True | False |
transcript
Transcript file name
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Transcript | transcript | True | False |
duration
Transcript duration in seconds
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Duration (sec) | transcript_duration | True | False |
language
Language spoken (placeholder)
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Language | transcript_language | False | False |
Option codes and descriptions
Code | Description |
---|---|
en | English |
neighborhood
Which neighborhood main speaker was recruited from (note that Pittsburghers often refer to municipalities near Pittsburgh as “neighborhoods”)
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Neighborhood | transcript_neighborhood | True | False |
Option codes
-
Cranberry Township
-
Forest Hills
-
Hill District
-
Lawrenceville
notes
General notes that help contextualize the transcript
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Notes | transcript_notes | False | False |
recording_date
Date of recording
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Recording date | transcript_recording_date | False | False |
transcribers
Name of transcriber(s) who completed the original orthographic transcription
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Transcribers | transcript_transcribers | False | False |
transcription_ai_tools
AI tool(s) used to assist human transcription
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Transcription AI tool(s) | transcript_transcription_ai_tools | False | True |
Option codes and descriptions
Code | Description |
---|---|
Batchalign segmentation | Turn segmentation using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model. |
Batchalign transcription | Orthographic transcription using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model, with some post-processing |
CLOx transcription | Orthographic transcription using CLOx (https://clox.ling.washington.edu/#/) |
Pyannote segmentation | Turn segmentation using pyannote (https://github.com/pyannote/pyannote-audio), with some post-processing |
For more context on how transcriptions were created, see here.
type
Sociolinguistic interview section
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Transcript type | transcript_type | True | False |
Option codes
-
interview
-
metalinguistic
-
pairs
-
reading
version_date
Date the transcript was last edited
Display title | Export name | Filterable | Multiple values |
---|---|---|---|
Version date | transcript_version_date | False | False |