Field guide: Transcript attributes

This page contains information on transcript attributes. For more information about how attributes work, and what the columns in the following tables mean, read the attribute typology.

On this page
  1. corpus
  2. episode
  3. transcript
  4. duration
  5. language
  6. neighborhood
  7. notes
  8. recording_date
  9. transcribers
  10. transcription_ai_tools
  11. type
  12. version_date

corpus

Collection of transcripts from a single research project

Display title Export name Filterable Multiple values
Corpus corpus False False

Option codes and descriptions

Code Description
pgh0307 Sociolinguistic interviews conducted between 2003 and 2007 in four Pittsburgh-area neighborhoods: Cranberry Township, Forest Hills, the Hill District, and Lawrenceville

episode

Series of transcripts from a single sociolinguistic interview

Display title Export name Filterable Multiple values
Episode episode True False

transcript

Transcript file name

Display title Export name Filterable Multiple values
Transcript transcript True False

duration

Transcript duration in seconds

Display title Export name Filterable Multiple values
Duration (sec) transcript_duration True False

language

Language spoken (placeholder)

Display title Export name Filterable Multiple values
Language transcript_language False False

Option codes and descriptions

Code Description
en English

neighborhood

Which neighborhood main speaker was recruited from (note that Pittsburghers often refer to municipalities near Pittsburgh as “neighborhoods”)

Display title Export name Filterable Multiple values
Neighborhood transcript_neighborhood True False

Option codes

  • Cranberry Township
  • Forest Hills
  • Hill District
  • Lawrenceville

notes

General notes that help contextualize the transcript

Display title Export name Filterable Multiple values
Notes transcript_notes False False

recording_date

Date of recording

Display title Export name Filterable Multiple values
Recording date transcript_recording_date False False

transcribers

Name of transcriber(s) who completed the original orthographic transcription

Display title Export name Filterable Multiple values
Transcribers transcript_transcribers False False

transcription_ai_tools

AI tool(s) used to assist human transcription

Display title Export name Filterable Multiple values
Transcription AI tool(s) transcript_transcription_ai_tools False True

Option codes and descriptions

Code Description
Batchalign segmentation Turn segmentation using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model.
Batchalign transcription Orthographic transcription using Batchalign 1 (https://github.com/TalkBank/batchalign) based on rev.ai model, with some post-processing
CLOx transcription Orthographic transcription using CLOx (https://clox.ling.washington.edu/#/)
Pyannote segmentation Turn segmentation using pyannote (https://github.com/pyannote/pyannote-audio), with some post-processing

For more context on how transcriptions were created, see here.

type

Sociolinguistic interview section

Display title Export name Filterable Multiple values
Transcript type transcript_type True False

Option codes

  • interview
  • metalinguistic
  • pairs
  • reading

version_date

Date the transcript was last edited

Display title Export name Filterable Multiple values
Version date transcript_version_date False False