This page contains descriptions of the unfairness mitigation strategies (UMSs) analyzed for the paper “Sociolinguistic auto-coding has fairness problems too: Measuring and mitigating overlearning bias”, published open-access in Linguistics Vanguard in 2024: https://doi.org/10.1515/lingvan-2022-0114.
UMSs are identified by 2- or 3-number decimal codes:
The UMSs fell into 4 categories:
All the UMSs modify the data fed into the model, rather than the model itself, as summarized here:
Category | Type of modification |
---|---|
Downsampling | Remove tokens (rows) |
Valid predictor selection | Remove acoustic measures (columns) |
Normalization | Transform acoustic measures but don’t remove data |
Combination | Remove tokens AND (remove OR transform acoustic measures) |
These UMSs correct for imbalances in training data by randomly selecting tokens (rows) to remove.
UMS | Description | Counts | Proportions by gender | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.0 | Baseline auto-coder of Rpresent, with all data and predictors |
|
|
||||||||||||||||||||||||||||
1.1 | Downsample men to equalize token counts by Gender |
|
N/A | ||||||||||||||||||||||||||||
1.2 | Downsample Absent to equalize token counts by Rpresent |
|
N/A | ||||||||||||||||||||||||||||
1.3.1 | Downsample women’s Absent to equalize Rpresent base rates by Gender |
|
|
||||||||||||||||||||||||||||
1.3.2 | Downsample men’s Present to equalize Rpresent base rates by Gender |
|
|
||||||||||||||||||||||||||||
1.4 | Downsample men’s data to equalize (a) token counts by Gender and (b) Rpresent base rates by Gender |
|
|
||||||||||||||||||||||||||||
1.5 | Downsample Absent data to equalize (a) token counts by Rpresent and (b) Gender base rates by Rpresent |
|
N/A | ||||||||||||||||||||||||||||
1.6 | Downsample Gender x Rpresent to equalize token counts by Gender x Rpresent |
|
N/A |
Note for code users
These implementations are actually more general than its descriptions suggest. For example, UMSs 1.3.1 and 1.3.2 both achieve equal /r/ base rates by gender, by downsampling either women’s Absent (1.3.1) or men’s Present (1.3.2). However, umsData()
actually translates this into “downsample one of the classes from the smaller group” vs. “the larger group”, automatically detecting which class to downsample from which group.
We can demonstrate this generality via a hypothetical dataset in which women are the larger group:
##Set up hypothetical dataset where Female tokens outnumber Male
dat1 <- tribble(
~Gender, ~Rpresent, ~n,
"Female", "Absent", 400,
"Female", "Present", 1000,
"Male", "Absent", 200,
"Male", "Present", 400,
) %>%
##Expand to individual rows
uncount(n) %>%
##Add dummy predictor
mutate(var1 = runif(n()))
##In dataset: women more numerous than men, greater % Present than men
umsData(dat1, "0.0", predictors=var1) %>%
printData("1.3.1") # Print as though applying UMS 1.3.1 (for proportions)
Data: 2000 tokens with 1 predictors
Gender
Rpresent Female Male Sum
Absent 400 200 600
Present 1000 400 1400
Sum 1400 600 2000
Proportionally by Gender:
Gender
Rpresent Female Male
Absent 0.2857143 0.3333333
Present 0.7142857 0.6666667
Sum 1.0000000 1.0000000
UMS 1.3.1 should downsample one class from the smaller group to match the class distribution of the larger group. Indeed, applying UMS 1.3.1 to this dataset results in fewer Absent tokens for men:
umsData(dat1, "1.3.1", predictors=var1) %>%
printData("1.3.1")
Data: 1960 tokens with 1 predictors
Gender
Rpresent Female Male Sum
Absent 400 160 560
Present 1000 400 1400
Sum 1400 560 1960
Proportionally by Gender:
Gender
Rpresent Female Male
Absent 0.2857143 0.2857143
Present 0.7142857 0.7142857
Sum 1.0000000 1.0000000
By contrast, UMS 1.3.2 should downsample one class from the larger group to match the class distribution of the smaller group. Indeed, applying UMS 1.3.2 to this dataset results in fewer Present tokens for women:
##UMS 1.3.2 should downsample larger group to match % Absent/Present of smaller group
umsData(dat1, "1.3.2", predictors=var1) %>%
printData("1.3.2")
##As expected, women's Present downsampled
Data: 1800 tokens with 1 predictors
Gender
Rpresent Female Male Sum
Absent 400 200 600
Present 800 400 1200
Sum 1200 600 1800
Proportionally by Gender:
Gender
Rpresent Female Male
Absent 0.3333333 0.3333333
Present 0.6666667 0.6666667
Sum 1.0000000 1.0000000
These UMSs remove acoustic measures (columns) that could inadvertently signal gender.
UMS | Description | Num. predictors | Num. removed | Predictors removed |
---|---|---|---|---|
0.0 | Baseline auto-coder of Rpresent, with all data and predictors | 180 | 0 | N/A |
2.1.1 | Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 10%) | 162 | 18 |
diffF3F1_30 , diffF3F1_35 , diffF3F1_50 , diffF3F1_65 , diffF3F1_80 , F3min , …
|
2.1.2 | Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 20%) | 144 | 36 |
diffF3F1_25 , diffF3F1_55 , diffF3F1_70 , F0max , F3min , F4_35 , …
|
2.1.3 | Empirical predictor selection, removing most influential predictors in classifier of Gender (cutoff: top 50%) | 90 | 90 |
diffF3F1_45 , diffF4F3_35 , diffF4F3_65 , F2_20 , F2_40 , F2min , …
|
2.1.4 | Empirical predictor selection, without measures with differential importance in separate-Gender auto-coders of Rpresent (difference in rank places: at least p/2) | 167 | 13 |
BW3_70 , F2_35 , F3_60 , F3_75 , F3_80 , F4range , …
|
2.1.5 | Empirical predictor selection, without measures with differential importance in separate-Gender auto-coders of Rpresent (difference in rank places: at least p/3) | 140 | 40 |
absSlopeF0 , BW3_70 , BW4_40 , F2_65 , F2_75 , F4_70 , …
|
2.2 | Theoretical predictor selection, removing all F0 measures | 176 | 4 |
F0min , F0max , F0rangeST , absSlopeF0
|
2.3 | Empirical and theoretical predictor selection, removing only F0 measures that correlate with Gender | 178 | 2 |
F0min , F0max
|
More info
Most of these UMSs relied on “predecessor” auto-coders: UMS 0.1.1 (which used the same predictor set to classify gender rather than /r/) and UMS 0.2 (one /r/ auto-coder for women’s tokens, one for men’s). Here are the variable importances for those auto-coders:
##UMS 0.1.1: Classifier of Gender
vi011 <- read.csv("Outputs/Other/Var-Imp_UMS0.1.1.csv")
vi011 %>%
mutate(Rank = rank(desc(Importance))) %>%
arrange(Rank)
##UMS 0.2: Separate auto-coders of Rpresent by Gender
vi02 <- read.csv("Outputs/Other/Var-Imp_UMS0.2.csv")
vi02 %>%
mutate(across(-Measure, ~ rank(desc(.x))),
RankDiff = abs(Importance_Female - Importance_Male)) %>%
arrange(desc(RankDiff))
This UMS controls for acoustic variability that could inadvertently signal gender—in this case, each token’s minimum and maximum pitch. Unlike the other categories, this UMS transforms the data rather than removing rows or columns.
More info
In this dataset, formant timepoint measurements (e.g., F3_50
, or F3 at 50% of the token’s duration) had already been speaker-normalized as part of data pre-processing. However, F0min
and F0max
(each token’s minimum and maximum pitch) were not normalized. UMS 3.1 normalizes these measures by subtracting by-speaker averages of these measures for word-initial /r/ tokens (see Input-Data/meanPitches.csv
).
Distributions of F0min
and F0max
, before normalization:
After normalization:
These UMSs combine downsampling plus valid predictor selection or normalization. Refer to the relevant sections above for more info on the component UMSs.
UMS | Description | Num. tokens | Num. predictors |
---|---|---|---|
0.0 | Baseline auto-coder of Rpresent, with all data and predictors | 5620 | 180 |
4.1.1 | Combination of 2.1.1 & 1.3.1 | 4679 | 162 |
4.1.2 | Combination of 2.1.1 & 1.3.2 | 4817 | 162 |
4.2.1 | Combination of 2.2 & 1.3.1 | 4679 | 176 |
4.2.2 | Combination of 2.2 & 1.3.2 | 4817 | 176 |
4.3.1 | Combination of 2.3 & 1.3.1 | 4679 | 178 |
4.3.2 | Combination of 2.3 & 1.3.2 | 4817 | 178 |
4.4.1 | Combination of 3.1 & 1.3.1 | 4679 | 180 |
4.4.2 | Combination of 3.1 & 1.3.2 | 4817 | 180 |
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.4.2
LAPACK: /usr/lib64/liblapack.so.3.4.2
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] this.path_2.0.0 knitr_1.43 magrittr_2.0.3 rlang_1.1.1
[5] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
[9] purrr_1.0.2 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
[13] ggplot2_3.4.2 tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] gtable_0.3.3 jsonlite_1.8.7 highr_0.10 compiler_4.3.0
[5] renv_1.0.1 tidyselect_1.2.0 jquerylib_0.1.4 scales_1.2.1
[9] yaml_2.3.7 fastmap_1.1.1 R6_2.5.1 labeling_0.4.2
[13] generics_0.1.3 munsell_0.5.0 bslib_0.5.0 pillar_1.9.0
[17] tzdb_0.4.0 utf8_1.2.3 stringi_1.7.12 cachem_1.0.8
[21] xfun_0.40 sass_0.4.7 timechange_0.2.0 cli_3.6.1
[25] withr_2.5.0 digest_0.6.33 grid_4.3.0 hms_1.1.3
[29] lifecycle_1.0.3 vctrs_0.6.3 evaluate_0.21 glue_1.6.2
[33] farver_2.1.1 fansi_1.0.4 colorspace_2.1-0 rmarkdown_2.23
[37] tools_4.3.0 pkgconfig_2.0.3 htmltools_0.5.5