1 Introduction

This page contains descriptions of the unfairness mitigation strategies (UMSs) analyzed for the paper “Sociolinguistic auto-coding has fairness problems too: Measuring and mitigating overlearning bias”, published open-access in Linguistics Vanguard in 2024: https://doi.org/10.1515/lingvan-2022-0114.

Repository navigation:

2 UMS codes

UMSs are identified by 2- or 3-number decimal codes:

First digit: general category (e.g., “downsampling”)
- 0.x(.y): Either the baseline auto-coder (UMS 0.0) or a “predecessor auto-coder” that provides data to inform other UMSs
Second digit: specific strategy (e.g., “achieve equal base rates by gender”)
Third digit (optional): implementation (e.g., “achieve equal base rates by downsampling women’s Absent tokens” vs. “…by downsampling men’s Present tokens”)
- Used when the specific strategy can be implemented multiple ways

3 UMS descriptions

The UMSs fell into 4 categories:

Downsampling: Correct for imbalances in training data by randomly selecting tokens to remove.
Valid predictor selection: Remove acoustic measures that could inadvertently signal gender.
Normalization: Control for acoustic variability that could inadvertently signal gender.
Combinations of other strategies: Downsampling plus valid predictor selection or normalization.

All the UMSs modify the data fed into the model, rather than the model itself, as summarized here:

Category	Type of modification
Downsampling	Remove tokens (rows)
Valid predictor selection	Remove acoustic measures (columns)
Normalization	Transform acoustic measures but don’t remove data
Combination	Remove tokens AND (remove OR transform acoustic measures)

3.1 Downsampling

These UMSs correct for imbalances in training data by randomly selecting tokens (rows) to remove.

UMS

Description

Counts

Proportions by gender

0.0

Baseline auto-coder of Rpresent, with all data and predictors

	Female	Male	Sum
Absent	1504	2548	4052
Present	284	1284	1568
Sum	1788	3832	5620

	Female	Male
Absent	0.841	0.665
Present	0.159	0.335
Sum	1.000	1.000

1.1

Downsample men to equalize token counts by Gender

	Female	Male	Sum
Absent	1504	1190	2694
Present	284	598	882
Sum	1788	1788	3576

N/A

1.2

Downsample Absent to equalize token counts by Rpresent

	Female	Male	Sum
Absent	617	951	1568
Present	284	1284	1568
Sum	901	2235	3136

N/A

1.3.1

Downsample women’s Absent to equalize Rpresent base rates by Gender

	Female	Male	Sum
Absent	563	2548	3111
Present	284	1284	1568
Sum	847	3832	4679

	Female	Male
Absent	0.665	0.665
Present	0.335	0.335
Sum	1.000	1.000

1.3.2

Downsample men’s Present to equalize Rpresent base rates by Gender

	Female	Male	Sum
Absent	1504	2548	4052
Present	284	481	765
Sum	1788	3029	4817

	Female	Male
Absent	0.841	0.841
Present	0.159	0.159
Sum	1.000	1.000

1.4

Downsample men’s data to equalize (a) token counts by Gender and (b) Rpresent base rates by Gender

	Female	Male	Sum
Absent	1504	1504	3008
Present	284	284	568
Sum	1788	1788	3576

	Female	Male
Absent	0.841	0.841
Present	0.159	0.159
Sum	1.000	1.000

1.5

Downsample Absent data to equalize (a) token counts by Rpresent and (b) Gender base rates by Rpresent

	Female	Male	Sum
Absent	284	1284	1568
Present	284	1284	1568
Sum	568	2568	3136

N/A

1.6

Downsample Gender x Rpresent to equalize token counts by Gender x Rpresent

	Female	Male	Sum
Absent	284	284	568
Present	284	284	568
Sum	568	568	1136

N/A

Note for code users

These implementations are actually more general than its descriptions suggest. For example, UMSs 1.3.1 and 1.3.2 both achieve equal /r/ base rates by gender, by downsampling either women’s Absent (1.3.1) or men’s Present (1.3.2). However, umsData() actually translates this into “downsample one of the classes from the smaller group” vs. “the larger group”, automatically detecting which class to downsample from which group.

We can demonstrate this generality via a hypothetical dataset in which women are the larger group:

##Set up hypothetical dataset where Female tokens outnumber Male
dat1 <- tribble(
  ~Gender,  ~Rpresent, ~n,
  "Female", "Absent",  400,
  "Female", "Present", 1000,
  "Male",   "Absent",  200,
  "Male",   "Present", 400,
) %>% 
  ##Expand to individual rows
  uncount(n) %>% 
  ##Add dummy predictor
  mutate(var1 = runif(n()))

##In dataset: women more numerous than men, greater % Present than men
umsData(dat1, "0.0", predictors=var1) %>% 
  printData("1.3.1") # Print as though applying UMS 1.3.1 (for proportions)

Data: 2000 tokens with 1 predictors
         Gender
Rpresent  Female Male  Sum
  Absent     400  200  600
  Present   1000  400 1400
  Sum       1400  600 2000
Proportionally by Gender:
         Gender
Rpresent     Female      Male
  Absent  0.2857143 0.3333333
  Present 0.7142857 0.6666667
  Sum     1.0000000 1.0000000

UMS 1.3.1 should downsample one class from the smaller group to match the class distribution of the larger group. Indeed, applying UMS 1.3.1 to this dataset results in fewer Absent tokens for men:

umsData(dat1, "1.3.1", predictors=var1) %>% 
  printData("1.3.1")

Data: 1960 tokens with 1 predictors
         Gender
Rpresent  Female Male  Sum
  Absent     400  160  560
  Present   1000  400 1400
  Sum       1400  560 1960
Proportionally by Gender:
         Gender
Rpresent     Female      Male
  Absent  0.2857143 0.2857143
  Present 0.7142857 0.7142857
  Sum     1.0000000 1.0000000

By contrast, UMS 1.3.2 should downsample one class from the larger group to match the class distribution of the smaller group. Indeed, applying UMS 1.3.2 to this dataset results in fewer Present tokens for women:

##UMS 1.3.2 should downsample larger group to match % Absent/Present of smaller group
umsData(dat1, "1.3.2", predictors=var1) %>% 
  printData("1.3.2")
##As expected, women's Present downsampled

Data: 1800 tokens with 1 predictors
         Gender
Rpresent  Female Male  Sum
  Absent     400  200  600
  Present    800  400 1200
  Sum       1200  600 1800
Proportionally by Gender:
         Gender
Rpresent     Female      Male
  Absent  0.3333333 0.3333333
  Present 0.6666667 0.6666667
  Sum     1.0000000 1.0000000

3.2 Valid predictor selection

These UMSs remove acoustic measures (columns) that could inadvertently signal gender.

UMS	Description	Num. predictors	Num. removed	Predictors removed
0.0	Baseline auto-coder of Rpresent, with all data and predictors	180	0	N/A
2.1.1	Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 10%)	162	18	`diffF3F1_30`, `diffF3F1_35`, `diffF3F1_50`, `diffF3F1_65`, `diffF3F1_80`, `F3min`, …
2.1.2	Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 20%)	144	36	`diffF3F1_25`, `diffF3F1_55`, `diffF3F1_70`, `F0max`, `F3min`, `F4_35`, …
2.1.3	Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 50%)	90	90	`diffF3F1_45`, `diffF4F3_35`, `diffF4F3_65`, `F2_20`, `F2_40`, `F2min`, …
2.1.4	Empirical predictor selection, without measures with differential importance in separate-Gender auto-coders of Rpresent (difference in rank places: at least p/2)	167	13	`BW3_70`, `F2_35`, `F3_60`, `F3_75`, `F3_80`, `F4range`, …
2.1.5	Empirical predictor selection, without measures with differential importance in separate-Gender auto-coders of Rpresent (difference in rank places: at least p/3)	140	40	`absSlopeF0`, `BW3_70`, `BW4_40`, `F2_65`, `F2_75`, `F4_70`, …
2.2	Theoretical predictor selection, removing all F0 measures	176	4	`F0min`, `F0max`, `F0rangeST`, `absSlopeF0`
2.3	Empirical and theoretical predictor selection, removing only F0 measures that correlate with Gender	178	2	`F0min`, `F0max`

More info

Most of these UMSs relied on “predecessor” auto-coders: UMS 0.1.1 (which used the same predictor set to classify gender rather than /r/) and UMS 0.2 (one /r/ auto-coder for women’s tokens, one for men’s). Here are the variable importances for those auto-coders:

##UMS 0.1.1: Classifier of Gender
vi011 <- read.csv("Outputs/Other/Var-Imp_UMS0.1.1.csv")
vi011 %>% 
  mutate(Rank = rank(desc(Importance))) %>% 
  arrange(Rank)

##UMS 0.2: Separate auto-coders of Rpresent by Gender
vi02 <- read.csv("Outputs/Other/Var-Imp_UMS0.2.csv")
vi02 %>% 
  mutate(across(-Measure, ~ rank(desc(.x))),
         RankDiff = abs(Importance_Female - Importance_Male)) %>% 
  arrange(desc(RankDiff))

3.3 Normalization

This UMS controls for acoustic variability that could inadvertently signal gender—in this case, each token’s minimum and maximum pitch. Unlike the other categories, this UMS transforms the data rather than removing rows or columns.

More info

In this dataset, formant timepoint measurements (e.g., F3_50, or F3 at 50% of the token’s duration) had already been speaker-normalized as part of data pre-processing. However, F0min and F0max (each token’s minimum and maximum pitch) were not normalized. UMS 3.1 normalizes these measures by subtracting by-speaker averages of these measures for word-initial /r/ tokens (see Input-Data/meanPitches.csv).

Distributions of F0min and F0max, before normalization:

A pair of histograms with distributions of raw F0 measurements (F0min and F0max). For both measurements, there are distinct modes for Female and Male distributions, with not much overlap.

After normalization:

A pair of histograms with distributions of normalized F0 measurements (F0min and F0max). For both measurements, Female and Male distributions are almost completely overlapping.

3.4 Combinations of other strategies

These UMSs combine downsampling plus valid predictor selection or normalization. Refer to the relevant sections above for more info on the component UMSs.

UMS	Description	Num. tokens	Num. predictors
0.0	Baseline auto-coder of Rpresent, with all data and predictors	5620	180
4.1.1	Combination of 2.1.1 & 1.3.1	4679	162
4.1.2	Combination of 2.1.1 & 1.3.2	4817	162
4.2.1	Combination of 2.2 & 1.3.1	4679	176
4.2.2	Combination of 2.2 & 1.3.2	4817	176
4.3.1	Combination of 2.3 & 1.3.1	4679	178
4.3.2	Combination of 2.3 & 1.3.2	4817	178
4.4.1	Combination of 3.1 & 1.3.1	4679	180
4.4.2	Combination of 3.1 & 1.3.2	4817	180

4 R session info

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS:   /usr/lib64/libblas.so.3.4.2 
LAPACK: /usr/lib64/liblapack.so.3.4.2

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] this.path_2.0.0 knitr_1.43      magrittr_2.0.3  rlang_1.1.1    
 [5] lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.2    
 [9] purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
[13] ggplot2_3.4.2   tidyverse_2.0.0

loaded via a namespace (and not attached):
 [1] gtable_0.3.3     jsonlite_1.8.7   highr_0.10       compiler_4.3.0  
 [5] renv_1.0.1       tidyselect_1.2.0 jquerylib_0.1.4  scales_1.2.1    
 [9] yaml_2.3.7       fastmap_1.1.1    R6_2.5.1         labeling_0.4.2  
[13] generics_0.1.3   munsell_0.5.0    bslib_0.5.0      pillar_1.9.0    
[17] tzdb_0.4.0       utf8_1.2.3       stringi_1.7.12   cachem_1.0.8    
[21] xfun_0.40        sass_0.4.7       timechange_0.2.0 cli_3.6.1       
[25] withr_2.5.0      digest_0.6.33    grid_4.3.0       hms_1.1.3       
[29] lifecycle_1.0.3  vctrs_0.6.3      evaluate_0.21    glue_1.6.2      
[33] farver_2.1.1     fansi_1.0.4      colorspace_2.1-0 rmarkdown_2.23  
[37] tools_4.3.0      pkgconfig_2.0.3  htmltools_0.5.5

Unfairness mitigation strategy descriptions

Dan Villarreal (Department of Linguistics, University of Pittsburgh)