1 Introduction

This page contains descriptions of the unfairness mitigation strategies (UMSs) analyzed for the paper “Sociolinguistic auto-coding has fairness problems too: Measuring and mitigating overlearning bias”, published open-access in Linguistics Vanguard in 2024: https://doi.org/10.1515/lingvan-2022-0114.

2 UMS codes

UMSs are identified by 2- or 3-number decimal codes:

  • First digit: general category (e.g., “downsampling”)
    • 0.x(.y): Either the baseline auto-coder (UMS 0.0) or a “predecessor auto-coder” that provides data to inform other UMSs
  • Second digit: specific strategy (e.g., “achieve equal base rates by gender”)
  • Third digit (optional): implementation (e.g., “achieve equal base rates by downsampling women’s Absent tokens” vs. “…by downsampling men’s Present tokens”)
    • Used when the specific strategy can be implemented multiple ways

3 UMS descriptions

The UMSs fell into 4 categories:

  1. Downsampling: Correct for imbalances in training data by randomly selecting tokens to remove.
  2. Valid predictor selection: Remove acoustic measures that could inadvertently signal gender.
  3. Normalization: Control for acoustic variability that could inadvertently signal gender.
  4. Combinations of other strategies: Downsampling plus valid predictor selection or normalization.

All the UMSs modify the data fed into the model, rather than the model itself, as summarized here:

Category Type of modification
Downsampling Remove tokens (rows)
Valid predictor selection Remove acoustic measures (columns)
Normalization Transform acoustic measures but don’t remove data
Combination Remove tokens AND (remove OR transform acoustic measures)

3.1 Downsampling

These UMSs correct for imbalances in training data by randomly selecting tokens (rows) to remove.

UMS Description Counts Proportions by gender
0.0 Baseline auto-coder of Rpresent, with all data and predictors
Female Male Sum
Absent 1504 2548 4052
Present 284 1284 1568
Sum 1788 3832 5620
Female Male
Absent 0.841 0.665
Present 0.159 0.335
Sum 1.000 1.000
1.1 Downsample men to equalize token counts by Gender
Female Male Sum
Absent 1504 1190 2694
Present 284 598 882
Sum 1788 1788 3576
1.2 Downsample Absent to equalize token counts by Rpresent
Female Male Sum
Absent 617 951 1568
Present 284 1284 1568
Sum 901 2235 3136
1.3.1 Downsample women’s Absent to equalize Rpresent base rates by Gender
Female Male Sum
Absent 563 2548 3111
Present 284 1284 1568
Sum 847 3832 4679
Female Male
Absent 0.665 0.665
Present 0.335 0.335
Sum 1.000 1.000
1.3.2 Downsample men’s Present to equalize Rpresent base rates by Gender
Female Male Sum
Absent 1504 2548 4052
Present 284 481 765
Sum 1788 3029 4817
Female Male
Absent 0.841 0.841
Present 0.159 0.159
Sum 1.000 1.000
1.4 Downsample men’s data to equalize (a) token counts by Gender and (b) Rpresent base rates by Gender
Female Male Sum
Absent 1504 1504 3008
Present 284 284 568
Sum 1788 1788 3576
Female Male
Absent 0.841 0.841
Present 0.159 0.159
Sum 1.000 1.000
1.5 Downsample Absent data to equalize (a) token counts by Rpresent and (b) Gender base rates by Rpresent
Female Male Sum
Absent 284 1284 1568
Present 284 1284 1568
Sum 568 2568 3136
1.6 Downsample Gender x Rpresent to equalize token counts by Gender x Rpresent
Female Male Sum
Absent 284 284 568
Present 284 284 568
Sum 568 568 1136

Note for code users

These implementations are actually more general than its descriptions suggest. For example, UMSs 1.3.1 and 1.3.2 both achieve equal /r/ base rates by gender, by downsampling either women’s Absent (1.3.1) or men’s Present (1.3.2). However, umsData() actually translates this into “downsample one of the classes from the smaller group” vs. “the larger group”, automatically detecting which class to downsample from which group.

We can demonstrate this generality via a hypothetical dataset in which women are the larger group:

##Set up hypothetical dataset where Female tokens outnumber Male
dat1 <- tribble(
  ~Gender,  ~Rpresent, ~n,
  "Female", "Absent",  400,
  "Female", "Present", 1000,
  "Male",   "Absent",  200,
  "Male",   "Present", 400,
) %>% 
  ##Expand to individual rows
  uncount(n) %>% 
  ##Add dummy predictor
  mutate(var1 = runif(n()))

##In dataset: women more numerous than men, greater % Present than men
umsData(dat1, "0.0", predictors=var1) %>% 
  printData("1.3.1") # Print as though applying UMS 1.3.1 (for proportions)
Data: 2000 tokens with 1 predictors
Rpresent  Female Male  Sum
  Absent     400  200  600
  Present   1000  400 1400
  Sum       1400  600 2000
Proportionally by Gender:
Rpresent     Female      Male
  Absent  0.2857143 0.3333333
  Present 0.7142857 0.6666667
  Sum     1.0000000 1.0000000

UMS 1.3.1 should downsample one class from the smaller group to match the class distribution of the larger group. Indeed, applying UMS 1.3.1 to this dataset results in fewer Absent tokens for men:

umsData(dat1, "1.3.1", predictors=var1) %>% 
Data: 1960 tokens with 1 predictors
Rpresent  Female Male  Sum
  Absent     400  160  560
  Present   1000  400 1400
  Sum       1400  560 1960
Proportionally by Gender:
Rpresent     Female      Male
  Absent  0.2857143 0.2857143
  Present 0.7142857 0.7142857
  Sum     1.0000000 1.0000000

By contrast, UMS 1.3.2 should downsample one class from the larger group to match the class distribution of the smaller group. Indeed, applying UMS 1.3.2 to this dataset results in fewer Present tokens for women:

##UMS 1.3.2 should downsample larger group to match % Absent/Present of smaller group
umsData(dat1, "1.3.2", predictors=var1) %>% 
##As expected, women's Present downsampled
Data: 1800 tokens with 1 predictors
Rpresent  Female Male  Sum
  Absent     400  200  600
  Present    800  400 1200
  Sum       1200  600 1800
Proportionally by Gender:
Rpresent     Female      Male
  Absent  0.3333333 0.3333333
  Present 0.6666667 0.6666667
  Sum     1.0000000 1.0000000

3.2 Valid predictor selection

These UMSs remove acoustic measures (columns) that could inadvertently signal gender.

UMS Description Num. predictors Num. removed Predictors removed
0.0 Baseline auto-coder of Rpresent, with all data and predictors 180 0 N/A
2.1.1 Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 10%) 162 18 diffF3F1_30, diffF3F1_35, diffF3F1_50, diffF3F1_65, diffF3F1_80, F3min, …
2.1.2 Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 20%) 144 36 diffF3F1_25, diffF3F1_55, diffF3F1_70, F0max, F3min, F4_35, …
2.1.3 Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 50%) 90 90 diffF3F1_45, diffF4F3_35, diffF4F3_65, F2_20, F2_40, F2min, …
2.1.4 Empirical predictor selection, without measures with differential importance in separate-Gender auto-coders of Rpresent (difference in rank places: at least p/2) 167 13 BW3_70, F2_35, F3_60, F3_75, F3_80, F4range, …
2.1.5 Empirical predictor selection, without measures with differential importance in separate-Gender auto-coders of Rpresent (difference in rank places: at least p/3) 140 40 absSlopeF0, BW3_70, BW4_40, F2_65, F2_75, F4_70, …
2.2 Theoretical predictor selection, removing all F0 measures 176 4 F0min, F0max, F0rangeST, absSlopeF0
2.3 Empirical and theoretical predictor selection, removing only F0 measures that correlate with Gender 178 2 F0min, F0max

More info

Most of these UMSs relied on “predecessor” auto-coders: UMS 0.1.1 (which used the same predictor set to classify gender rather than /r/) and UMS 0.2 (one /r/ auto-coder for women’s tokens, one for men’s). Here are the variable importances for those auto-coders:

##UMS 0.1.1: Classifier of Gender
vi011 <- read.csv("Outputs/Other/Var-Imp_UMS0.1.1.csv")
vi011 %>% 
  mutate(Rank = rank(desc(Importance))) %>% 
##UMS 0.2: Separate auto-coders of Rpresent by Gender
vi02 <- read.csv("Outputs/Other/Var-Imp_UMS0.2.csv")
vi02 %>% 
  mutate(across(-Measure, ~ rank(desc(.x))),
         RankDiff = abs(Importance_Female - Importance_Male)) %>% 

3.3 Normalization

This UMS controls for acoustic variability that could inadvertently signal gender—in this case, each token’s minimum and maximum pitch. Unlike the other categories, this UMS transforms the data rather than removing rows or columns.

More info

In this dataset, formant timepoint measurements (e.g., F3_50, or F3 at 50% of the token’s duration) had already been speaker-normalized as part of data pre-processing. However, F0min and F0max (each token’s minimum and maximum pitch) were not normalized. UMS 3.1 normalizes these measures by subtracting by-speaker averages of these measures for word-initial /r/ tokens (see Input-Data/meanPitches.csv).

Distributions of F0min and F0max, before normalization:

A pair of histograms with distributions of raw F0 measurements (F0min and F0max). For both measurements, there are distinct modes for Female and Male distributions, with not much overlap.

After normalization:

A pair of histograms with distributions of normalized F0 measurements (F0min and F0max). For both measurements, Female and Male distributions are almost completely overlapping.

3.4 Combinations of other strategies

These UMSs combine downsampling plus valid predictor selection or normalization. Refer to the relevant sections above for more info on the component UMSs.

UMS Description Num. tokens Num. predictors
0.0 Baseline auto-coder of Rpresent, with all data and predictors 5620 180
4.1.1 Combination of 2.1.1 & 1.3.1 4679 162
4.1.2 Combination of 2.1.1 & 1.3.2 4817 162
4.2.1 Combination of 2.2 & 1.3.1 4679 176
4.2.2 Combination of 2.2 & 1.3.2 4817 176
4.3.1 Combination of 2.3 & 1.3.1 4679 178
4.3.2 Combination of 2.3 & 1.3.2 4817 178
4.4.1 Combination of 3.1 & 1.3.1 4679 180
4.4.2 Combination of 3.1 & 1.3.2 4817 180

