Field guide: Transcript attributes
This page contains information on transcript attributes. For more information about how attributes work, and what the columns in the following tables mean, read the attribute typology.
On this page
corpus
Collection of transcripts from a single research project
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Corpus | corpus | False | False |
Option codes and descriptions
| Code | Description |
|---|---|
pgh0307 | Sociolinguistic interviews conducted between 2003 and 2007 in four Pittsburgh-area neighborhoods: Cranberry Township, Forest Hills, the Hill District, and Lawrenceville |
episode
Series of transcripts from a single sociolinguistic interview
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Episode | episode | True | False |
transcript
Transcript file name
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Transcript | transcript | True | False |
duration
Transcript duration in seconds
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Duration (sec) | transcript_duration | True | False |
language
Language spoken (placeholder)
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Language | transcript_language | False | False |
Option codes and descriptions
| Code | Description |
|---|---|
en | English |
neighborhood
Which neighborhood main speaker was recruited from (note that Pittsburghers often refer to municipalities near Pittsburgh as “neighborhoods”)
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Neighborhood | transcript_neighborhood | True | False |
Option codes
-
Cranberry Township -
Forest Hills -
Hill District -
Lawrenceville
notes
General notes that help contextualize the transcript
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Notes | transcript_notes | False | False |
recording_date
Date of recording
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Recording date | transcript_recording_date | False | False |
transcribers
Name of transcriber(s) who completed the original orthographic transcription
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Transcribers | transcript_transcribers | False | False |
transcription_ai_tools
AI tool(s) used to assist human transcription
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Transcription AI tool(s) | transcript_transcription_ai_tools | False | True |
Option codes and descriptions
| Code | Description |
|---|---|
Batchalign segmentation | Turn segmentation using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model. |
Batchalign transcription | Orthographic transcription using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model, with some post-processing |
CLOx transcription | Orthographic transcription using CLOx (https://clox.ling.washington.edu/#/) |
Pyannote segmentation | Turn segmentation using pyannote (https://github.com/pyannote/pyannote-audio), with some post-processing |
For more context on how transcriptions were created, see here.
type
Sociolinguistic interview section
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Transcript type | transcript_type | True | False |
Option codes
-
interview -
metalinguistic -
pairs -
reading
version_date
Date the transcript was last edited
| Display title | Export name | Filterable | Multiple values |
|---|---|---|---|
| Version date | transcript_version_date | False | False |