2023 Symposium Posters

Posters > 2023

Impact of Data Quality and Data Preprocessing on ML Model Fairness

Primary Investigator:
Romila Pradhan

Project Members
Sathvika Kotha, Romila Pradhan

Abstract

The success of machine learning techniques in widespread applications has taught us that with respect to accuracy, the more data, the better the model. However, for fairness, data quality is perhaps more important than quantity. Existing studies have considered the impact of data preprocessing on the accuracy of ML model tasks. However, the impact of preprocessing on the fairness of the down-stream model has neither been studied nor well understood. In this paper, we conduct a systematic study of how data quality issues and data preprocessing steps impact model fairness. Our study evaluates a number of preprocessing techniques for several machine learning models trained over datasets with different characteristics and evaluated using several fairness metrics.