Phonemic transcription with the DISC alphabet

APLS primarily uses the DISC phonemic alphabet¹ for representing speech sounds (specifically, phonemes), rather than the International Phonetic Alphabet (IPA). DISC creates a one-to-one mapping between sounds and symbols like the IPA, but unlike the IPA, DISC only uses symbols that appear on a standard QWERTY keyboard. While the IPA is widely-recognized among linguists, many IPA characters are hard for end-users to input and difficult for computers to store.² As a result, APLS uses DISC internally for storing and searching phonological data, it exports phonological data in DISC, and APLS transcribers use DISC for “pronounce codes” when a word’s pronunciation needs to be specified (e.g., an incomplete word). In APLS, IPA is used only for displaying phonological data to end-users.

APLS uses a subset of DISC relevant to North American Englishes. In APLS, we use DISC symbols for phonemic representations of sounds, not phonetic representations. As a result, the APLS subset of DISC doesn’t have symbols for [ɾ] or [ʔ] (flap or glottal stop); in North American Englishes, these surface only as allophones of /t/.

In this document, fixed-width font is used for symbols you actually type into APLS search fields or a transcription program (Praat or Elan).

On this page

DISC consonants
1. Non-syllabic consonants
2. Syllabic consonants
DISC vowels
Transcribing words using DISC
Suggesting new dictionary entries

DISC consonants

Non-syllabic consonants

IPA	DISC	English	DISC
p	p	pat	p{t
b	b	bad	b{d
t	t	tack	t{k
d	d	dad	d{d
k	k	cad	k{d
ɡ	g	gap	g{p
f	f	fad	f{d
v	v	vat	v{t
θ	T	thin	TIn
ð	D	then	DEn
s	s	sap	s{p
z	z	zap	z{p
ʃ	S	sheep	Sip
ʒ	Z	measure	mEZ@r
h	h	had	h{d
tʃ	J	cheap	Jip
dʒ	_	jeep	_ip
m	m	mad	m{d
n	n	nat	n{t
ŋ	N	bang	b{N
l	l	lad	l{d
ɹ	r	rad	r{d
w	w	wet	wEt
j	j	yet	jEt

Syllabic consonants

IPA	DISC	English	DISC
m̩	F	idealism	2dilIzF
n̩	H	burden	b3rdH
l̩	P	dangle	d{NgP

DISC vowels

Monophthongs not before /ɹ/

IPA	DISC	English	DISC
i	i	fleece	flis
ɪ	I	kit	kIt
ε	E	dress	drEs
æ	{	trap	tr{p
ɑ	Q	lot	lQt
ɔ	$	thought	T$t
ʌ	V	strut	strVt
ʊ	U	foot	fUt
u	u	goose	gus
ə	@	comma	kQm@

Monophthongs before /ɹ/

IPA	DISC	English	DISC
ɪɹ	7r	near	n7r
εɹ	8r	square	skw8r
aɹ	#r	start	st#rt
ɝ	3r	nurse	n3rs
ɔɹ	$r	force	f$rs
ʊɹ	9r	cure	kj9r
ɚ	@r	letter	lEt@r

Diphthongs

IPA	DISC	English	DISC
eɪ	1	face	f1s
aɪ	2	price	pr2s
aʊ	6	mouth	m6T
ɔɪ	4	choice	J4s
oʊ	5	goat	g5t

Transcribing words using DISC

As noted above, DISC transcription is phonemic rather than phonetic. Here are some other things to note:

Monophthongs before /ɹ/ should always be transcribed with the DISC r symbol (see examples above)
- Exception: the speaker cuts off the word before an /ɹ/ (e.g., cutting off start as sta~[st#]
IPA makes a distinction between stressed central vowels /ʌ ɝ/ and unstressed central vowels /ə ɚ/. Similiarly, use DISC V/3r for stressed vowels and @/@r for unstressed vowels
- Exception: Always use 3r for -burg(h), even if it’s unstressed (e.g., Pittsburghese is pItsb3rgiz not pItsb@rgiz)
Use the DISC symbols for syllabic consonants, F/H/P, rather than @m/@n/@l (see examples above)
- Exception: Don’t use these symbols for onsets (e.g., Panera is p@n8r@, not pH8r@)
For word-final high front vowels, use i rather than I (i.e., we assume universal happy-tensing)
When /ŋ/ comes before a vowel or syllabic consonant, assume it’s followed by /g/ (e.g., dangle is d{NgP, not d{NP)
Don’t forget that nk in English spelling is usually /ŋk/ (DISC Nk) rather than /nk/ (DISC nk)
If you have the cot–caught merger in production and/or perception, ask a non-merged friend to help you decide whether to transcribe Q or $.
Unstressed vowels can be tricky!
- For “schwi”, use @ rather than I (e.g., breathless is brETl@s not brETlIs)
- Same for ‘s after a sibilant (e.g., Lutz’s is lVts@z not lVtsIz)
- Again, use @ for unstressed mid-central vowels (“schwa”) and @r pre-/ɹ/
There’s no [ɾ] (flap) symbol; instead, use t or d as suggested by orthography, even if it doesn’t match the voicing of the surface segment (e.g., Bettis is bEtIs, not bEdIs)

Suggesting new dictionary entries

APLS uses two sources to look up phonemic representations: (1) the Unisyn English dictionary (which is supposed to be a universal English dictionary), and (2) a custom dictionary. Occasionally you’ll run across a word that needs to be added to APLS’s custom dictionary. Most often this will be a Pittsburgh/western PA geographic name (e.g., neighborhoods like Shadyside, municipalities like Sewickley, streets like Baum, schools like Milliones), a brand name (e.g., Highmark, Panera), or a Pittsburgh lexical feature (e.g., redd, gumband). However, there are words that aren’t Pittsburgh-specific that are absent from Unisyn, sometimes unexpectedly; for example, we’ve had to manually add entries for artsy, bachelorette, homie, Kwanzaa, microbrew, stepdad, tarp, and y’all (among others).

If a word falls into one of the preceding categories, it should be added to the APLS dictionary. On the other hand, if a word is unlikely to be used by more than one speaker, it’s better to just use an inline pronounce code.

Everything from the previous sections applies to new words that you suggest for the custom dictionary, plus the following:

If applicable, you can suggest 2+ phonemic representations per word (but this is optional)
Use the speech community’s pronunciation(s)
Add symbols to mark syllabification and stress

Multiple phonemic representations

APLS’s custom dictionary can contain multiple phonemic representations for any given word. For example, the first vowel in Lawrenceville can rhyme with either shore or far ($r or #r). In that case, just suggest the full DISC representation twice: 'l$r-Hs-"vIl or 'l#r-Hs-"vIl.

Keep in mind that these are phonemic representations. This means that we don’t add separate entries that reflect variable phonological processes like consonant cluster deletion. For example, even though Pittsburghers pronounce gumband as both [ˈgʌmbænd] and [ˈgʌmbæn], only 'gVm-b{nd should go in APLS’s custom dictionary.

Use the speech community’s pronunciation(s)

APLS’s dictionary entries should reflect Pittsburgh English speakers’ mental lexicons. That means that if Pittsburgh English speakers have a different mental representation for a word than we’d expect based on the word’s spelling or our experience, APLS’s dictionary needs to include the local phonemic representation. For example, Pittsburghers often pronounce Carnegie as [kɑɹ.ˈneɪ.gi], whereas non-Pittsburghers almost always use [ˈkɑɹ.nə.gi]. In the course of a single transcription, it’s impossible to know whether a particular pronunciation is widespread, so use your best judgment. As mentioned above, you can suggest multiple representations for a word, as long as they’re phonemic representations.

Syllabification and stress

DISC	Function	Note
`-`	Syllable boundary
`'`	Primary stress
`"`	Secondary stress
`0`	No stress	Only for the syllables layer

For example, Pennsylvanian is "pEn-sP-'v1n-jH. Stress markers go before any other DISC symbols, but (if applicable) after the syllable boundary. Unstressed syllables get no stress markers.

A few finer points:

Every complete word has one primary-stressed syllable. Incomplete words with hesitations might not have a primary-stressed syllable, but the APLS custom dictionary won’t have incomplete words
- Most words don’t have secondary stress. Compounds often do have secondary stress (e.g., Sandcastle is 's{nd-"k{-sP)
Use secondary stress for city-name morphemes -burg(h), -ville, -vale, -dale, -town, etc.
- Exception: There are derivational affixes that change the stress pattern (e.g., Pittsburgh is 'pIts-"b3rg, but Pittsburghese is "pIts-b3r-'giz)
Intervocalic consonants should be syllabified as onsets rather than codas (e.g., Bettis is 'bE-tIs not 'bEt-Is)
- Exception 1: Intervocalic /ɹ/ should be syllabified as a coda (e.g., Panera is p@-'n8r-@ not p@-'n8-r@)
- Exception 2: In environments where prevocalic /t/ becomes [ʔ] (e.g., before /n̩/), the /t/ should be syllabified as a coda (e.g., outen is '6t-H not '6-tH)

Pause marker (foll_segment only)

DISC	Function
`.`	Pause

The foll_segment layer uses all the usual DISC characters with one addition: . as a pause marker. Knowing that a segment is followed by a phone can be useful for categorizing certain phonological effects, and an explicit symbol signals to end-users that this is not missing data. . was chosen because it is a keyboard (ASCII) character, and it already denotes pauses in similar notation systems: Extensions to the IPA, Jeffersonian (conversation analysis) transcription, etc.

DISC, developed for the CELEX project, stands for distinct single characters. For more details, see CELEX English documentation, section 2.4.1 (starting on p. 30 of the PDF). ↩
For the purpose of representing speech sounds in a database like APLS, IPA has several substantial drawbacks:
- Variability in representations: What is the IPA transcription for the phoneme in English prize? Depending on the author, it might be aɪ or aj or ai. The “ch” affricate might be t͡ʃ or tʃ; the last sound(s) in apple might be əl or əɫ or l̩. This effectively breaks the one-to-one sound-symbol mapping that is absolutely necessary from a data perspective.
- Lookalike characters: g (typewriter g) is often substituted for ɡ, : (colon) for ː, ' (apostrophe) for ˈ, superscript j for ʲ, etc. This also breaks the one-to-one mapping, and it can lead to hard-to-detect inconsistencies in the data.
- Multiple characters per phoneme: Some phonemes are represented with more than one IPA character because they’re multi-part sounds (e.g., diphthongs, affricates). This makes it harder to split strings of phonemes into individual phonemes, which has implications for large-scale processing of phonological data.
It must be said that DISC is not drawback-free:
- Unfamiliarity and adoption: Very few linguists are familiar with DISC, although most non-syllabic consonants match their IPA counterparts.
- Linguistically limited: Although it’s not relevant to APLS, DISC is limited to English, German, and Dutch.
- Pre-/ɹ/ vowels: In keeping with Wells lexical sets, DISC provides different symbols for (e.g.) the vowels in near and fleece. There is phonetic, phonological, and historical evidence for these being different phonemes in even rhotic varieties of English—hence why it was decided that the APLS subset of DISC would retain this distinction—but this may present a learning curve for North American Englishes researchers who aren’t accustomed to thinking of these as separate phonemes. (By convention, these are always followed by the r phoneme in APLS.)
- Some overlap: Two symbols each correspond to a pair of Wells lexical sets: @ for commA and lettER, $ for thought and force.
- Escape characters: Some DISC characters have special meaning in regular expressions (e.g., {, $), so they need to be “escaped” to be interpreted literally. That said, some other ASCII-based phonetic alphabets are much more challenging in this regard (e.g., X-SAMPA uses \).
↩