Phonemic transcription with the DISC alphabet
APLS primarily uses the DISC phonemic alphabet1 for representing speech sounds (specifically, phonemes), rather than the International Phonetic Alphabet (IPA). DISC creates a one-to-one mapping between sounds and symbols like the IPA, but unlike the IPA, DISC only uses symbols that appear on a standard QWERTY keyboard. While the IPA is well-recognized among linguists, many IPA characters are hard for end-users to input and difficult for computers to store.2 As a result, APLS uses DISC internally for storing and searching phonological data, it exports phonological data in DISC, and APLS transcribers use DISC for “pronounce codes” when a word’s pronunciation needs to be specified (e.g., an incomplete word). In APLS, IPA is used only for displaying phonological data to end-users.
APLS uses a subset of DISC relevant to North American Englishes. In APLS, we use DISC symbols for phonemic representations of sounds, not phonetic representations. As a result, the APLS subset of DISC doesn’t have symbols for [ɾ] or [ʔ] (flap or glottal stop); in North American Englishes, these surface only as allophones of /t/.
In this document, fixed-width font
is used for symbols you actually type into APLS search fields or a transcription program (Praat or Elan).
On this page
DISC consonants
Non-syllabic consonants
IPA | DISC | English | DISC |
---|---|---|---|
p | p | pat | p{t |
b | b | bad | b{d |
t | t | tack | t{k |
d | d | dad | d{d |
k | k | cad | k{d |
ɡ | g | gap | g{p |
f | f | fad | f{d |
v | v | vat | v{t |
θ | T | thin | TIn |
ð | D | then | DEn |
s | s | sap | s{p |
z | z | zap | z{p |
ʃ | S | sheep | Sip |
ʒ | Z | measure | mEZ@r |
h | h | had | h{d |
tʃ | J | cheap | Jip |
dʒ | _ | jeep | _ip |
m | m | mad | m{d |
n | n | nat | n{t |
ŋ | N | bang | b{N |
l | l | lad | l{d |
ɹ | r | rad | r{d |
w | w | wet | wEt |
j | j | yet | jEt |
Syllabic consonants
IPA | DISC | English | DISC |
---|---|---|---|
m̩ | F | idealism | 2dilIzF |
n̩ | H | burden | b3rdH |
l̩ | P | dangle | d{NgP |
DISC vowels
Monophthongs not before /ɹ/
IPA | DISC | English | DISC |
---|---|---|---|
i | i | fleece | flis |
ɪ | I | kit | kIt |
ε | E | dress | drEs |
æ | { | trap | tr{p |
ɑ | Q | lot | lQt |
ɔ | $ | thought | T$t |
ʌ | V | strut | strVt |
ʊ | U | foot | fUt |
u | u | goose | gus |
ə | @ | comma | kQm@ |
Monophthongs before /ɹ/
IPA | DISC | English | DISC |
---|---|---|---|
ɪɹ | 7r | near | n7r |
εɹ | 8r | square | skw8r |
aɹ | #r | start | st#rt |
ɝ | 3r | nurse | n3rs |
ɔɹ | $r | force | f$rs |
ʊɹ | 9r | cure | kj9r |
ɚ | @r | letter | lEt@r |
Diphthongs
IPA | DISC | English | DISC |
---|---|---|---|
eɪ | 1 | face | f1s |
aɪ | 2 | price | pr2s |
aʊ | 6 | mouth | m6T |
ɔɪ | 4 | choice | J4s |
oʊ | 5 | goat | g5t |
Transcribing words using DISC
As noted above, DISC transcription is phonemic rather than phonetic. Here are some other things to note:
- Monophthongs before /ɹ/ should always be transcribed with the DISC
r
symbol (see examples above)- Exception: the speaker cuts off the word before an /ɹ/ (e.g., cutting off start as
sta~[st#]
- Exception: the speaker cuts off the word before an /ɹ/ (e.g., cutting off start as
- IPA makes a distinction between stressed central vowels /ʌ ɝ/ and unstressed central vowels /ə ɚ/. Similiarly, use DISC
V
/3r
for stressed vowels and@
/@r
for unstressed vowels- Exception: Always use
3r
for -burg(h), even if it’s unstressed (e.g., Pittsburghese ispItsb3rgiz
notpItsb@rgiz
)
- Exception: Always use
- Use the DISC symbols for syllabic consonants,
F
/H
/P
, rather than@m
/@n
/@l
(see examples above)- Exception: Don’t use these symbols for onsets (e.g., Panera is
p@n8r@
, notpH8r@
)
- Exception: Don’t use these symbols for onsets (e.g., Panera is
- For word-final high front vowels, use
i
rather thanI
(i.e., we assume universal happy-tensing) - When /ŋ/ comes before a vowel or syllabic consonant, assume it’s followed by /g/ (e.g., dangle is
d{NgP
, notd{NP
) - Don’t forget that nk in English spelling is usually /ŋk/ (DISC
Nk
) rather than /nk/ (DISCnk
) - If you have the cot–caught merger in production and/or perception, ask a non-merged friend to help you decide whether to transcribe
Q
or$
. - Unstressed vowels can be tricky!
- For “schwi”, use
@
rather thanI
(e.g., breathless isbrETl@s
notbrETlIs
) - Same for ‘s after a sibilant (e.g., Lutz’s is
lVts@z
notlVtsIz
) - Again, use
@
for unstressed mid-central vowels (“schwa”) and@r
pre-/ɹ/
- For “schwi”, use
- There’s no [ɾ] (flap) symbol; instead, use
t
ord
as suggested by orthography, even if it doesn’t match the voicing of the surface segment (e.g., Bettis isbEtIs
, notbEdIs
)
Suggesting new dictionary entries
APLS uses two sources to look up phonemic representations: (1) the Unisyn English dictionary (which is supposed to be a universal English dictionary), and (2) a custom dictionary. Occasionally you’ll run across a word that needs to be added to APLS’s custom dictionary. Most often this will be a Pittsburgh/western PA geographic name (e.g., neighborhoods like Shadyside, municipalities like Sewickley, streets like Baum, schools like Milliones), a brand name (e.g., Highmark, Panera), or a Pittsburgh lexical feature (e.g., redd, gumband). However, there are words that aren’t Pittsburgh-specific that are absent from Unisyn, sometimes unexpectedly; for example, we’ve had to manually add entries for artsy, bachelorette, homie, Kwanzaa, microbrew, stepdad, tarp, and y’all (among others).
If a word falls into one of the preceding categories, it should be added to the APLS dictionary. On the other hand, if a word is unlikely to be used by more than one speaker, it’s better to just use an inline pronounce code.
Everything from the previous sections applies to new words that you suggest for the custom dictionary, plus the following:
- If applicable, you can suggest 2+ phonemic representations per word (but this is optional)
- Use the speech community’s pronunciation(s)
- Add symbols to mark syllabification and stress
Multiple phonemic representations
APLS’s custom dictionary can contain multiple phonemic representations for any given word. For example, the first vowel in Lawrenceville can rhyme with either shore or far ($r
or #r
). In that case, just suggest the full DISC representation twice: 'l$r-Hs-"vIl
or 'l#r-Hs-"vIl
.
Keep in mind that these are phonemic representations. This means that we don’t add separate entries that reflect variable phonological processes like consonant cluster deletion. For example, even though Pittsburghers pronounce gumband as both [ˈgʌmbænd] and [ˈgʌmbæn], only 'gVm-b{nd
should go in APLS’s custom dictionary.
Use the speech community’s pronunciation(s)
APLS’s dictionary entries should reflect Pittsburgh English speakers’ mental lexicons. That means that if Pittsburgh English speakers have a different mental representation for a word than we’d expect based on the word’s spelling or our experience, APLS’s dictionary needs to include the local phonemic representation. For example, Pittsburghers often pronounce Carnegie as [kɑɹ.ˈneɪ.gi], whereas non-Pittsburghers almost always use [ˈkɑɹ.nə.gi]. In the course of a single transcription, it’s impossible to know whether a particular pronunciation is widespread, so use your best judgment. As mentioned above, you can suggest multiple representations for a word, as long as they’re phonemic representations.
Syllabification and stress
DISC | Function | Note |
---|---|---|
- | Syllable boundary | |
' | Primary stress | |
" | Secondary stress | |
0 | No stress | Only for the syllables layer |
For example, Pennsylvanian is "pEn-sP-'v1n-jH
. Stress markers go before any other DISC symbols, but (if applicable) after the syllable boundary. Unstressed syllables get no stress markers.
A few finer points:
- Every complete word has one primary-stressed syllable. Incomplete words with hesitations might not have a primary-stressed syllable, but the APLS custom dictionary won’t have incomplete words
- Most words don’t have secondary stress. Compounds often do have secondary stress (e.g., Sandcastle is
's{nd-"k{-sP
)
- Most words don’t have secondary stress. Compounds often do have secondary stress (e.g., Sandcastle is
- Use secondary stress for city-name morphemes -burg(h), -ville, -vale, -dale, -town, etc.
- Exception: There are derivational affixes that change the stress pattern (e.g., Pittsburgh is
'pIts-"b3rg
, but Pittsburghese is"pIts-b3r-'giz
)
- Exception: There are derivational affixes that change the stress pattern (e.g., Pittsburgh is
- Intervocalic consonants should be syllabified as onsets rather than codas (e.g., Bettis is
'bE-tIs
not'bEt-Is
)- Exception 1: Intervocalic /ɹ/ should be syllabified as a coda (e.g., Panera is
p@-'n8r-@
notp@-'n8-r@
) - Exception 2: In environments where prevocalic /t/ becomes [ʔ] (e.g., before /n̩/), the /t/ should be syllabified as a coda (e.g., outen is
'6t-H
not'6-tH
)
- Exception 1: Intervocalic /ɹ/ should be syllabified as a coda (e.g., Panera is
Pause marker (foll_segment only)
DISC | Function |
---|---|
. | Pause |
The foll_segment layer uses all the usual DISC characters with one addition: .
as a pause marker. Knowing that a segment is followed by a phone can be useful for categorizing certain phonological effects, and an explicit symbol signals to end-users that this is not missing data. .
was chosen because it is a keyboard (ASCII) character, and it already denotes pauses in similar notation systems: Extensions to the IPA, Jeffersonian (conversation analysis) transcription, etc.
-
DISC, developed for the CELEX project, stands for distinct single characters. For more details, see CELEX English documentation, section 2.4.1 (starting on p. 30 of the PDF). ↩
-
For the purpose of representing speech sounds in a database like APLS, IPA has several substantial drawbacks:
- Variability in representations: What is the IPA transcription for the phoneme in English prize? Depending on the author, it might be
aɪ
oraj
orai
. The “ch” affricate might bet͡ʃ
ortʃ
; the last sound(s) in apple might beəl
orəɫ
orl̩
. This effectively breaks the one-to-one sound-symbol mapping that is absolutely necessary from a data perspective. - Lookalike characters:
g
(typewriter g) is often substituted forɡ
,:
(colon) forː
,'
(apostrophe) forˈ
, superscriptj
forʲ
, etc. This also breaks the one-to-one mapping, and it can lead to hard-to-detect inconsistencies in the data. - Multiple characters per phoneme: Some phonemes are represented with more than one IPA character because they’re multi-part sounds (e.g., diphthongs, affricates). This makes it harder to split strings of phonemes into individual phonemes, which has implications for large-scale processing of phonological data.
It must be said that DISC is not drawback-free:
- Unfamiliarity and adoption: Very few linguists are familiar with DISC, although most non-syllabic consonants match their IPA counterparts.
- Linguistically limited: Although it’s not relevant to APLS, DISC is limited to English, German, and Dutch.
- Pre-/ɹ/ vowels: In keeping with Wells lexical sets, DISC provides different symbols for (e.g.) the vowels in near and fleece. There is phonetic, phonological, and historical evidence for these being different phonemes in even rhotic varieties of English—hence why it was decided that the APLS subset of DISC would retain this distinction—but this may present a learning curve for North American Englishes researchers who aren’t accustomed to thinking of these as separate phonemes. (By convention, these are always followed by the
r
phoneme in APLS.) - Some overlap: Two symbols each correspond to a pair of Wells lexical sets:
@
for commA and lettER,$
for thought and force. - Escape characters: Some DISC characters have special meaning in regular expressions (e.g.,
{
,$
), so they need to be “escaped” to be interpreted literally. That said, some other ASCII-based phonetic alphabets are much more challenging in this regard (e.g., X-SAMPA uses\
).
- Variability in representations: What is the IPA transcription for the phoneme in English prize? Depending on the author, it might be