2019 Symposium Posters

Posters > 2019

Towards trustworthy NLP systems: detecting bias in popular models


Primary Investigator:
Julia (Taylor) Rayz

Project Members
John Phan, Kanishka Misra, Julia Rayz
Recent media coverage of AI applications has shown a growing concern and distrust in the creation of artificial intelligent machines. This work takes a closer look at biases that may be present in word embedding technology, a set of popular techniques used to capture the meaning of words based on distributional hypothesis idea (Harris 1954, Firth 1957). Trustworthy information from NLP systems is crucial to the applications that utilize word embedding technology. In this project, we aim to study the gender bias implicitly present within three forms of popular pretrained word embeddings -- word2vec (Mikolov et al 2013), GloVE (Pennington et al 2014), and FastText (Bojanowski et al 2017). Previous attempts at identifying and eliminating bias have shown to be ineffective (Gonen & Goldberg 2019). The question to be answered is how much bias is there? We use a support vector machine classifier (SVM) trained on 300 dimensional vectors of strictly gendered nouns (e.g., man, woman, girl, boy), with each dimension being a feature within the classification algorithm. SVM is used to determine the two most salient features from the vector space for each set of pretrained word embeddings. The algorithm is then tasked to classify potential gender of neutral words -- e.g., homemaker, boss, programmer. Results of our methods are consistent with previous work in identifying gender bias within word embeddings using other means (i.e., Bolukbasi et al 2016), and shows bias in word representation.