Data Preprocessing and ML Model Fairness

Research Areas: Artificial Intelligence and Machine Learning

Principal Investigator: Romila Pradhan

The success of machine learning techniques in widespread applications has taught us that with respect to accuracy, the more data, the better the model. However, for fairness, data quality is perhaps more important than quantity. Before being fed into an ML model, training data undergoes a number of preprocessing steps. Existing studies have considered the impact of data preprocessing on the accuracy of ML model tasks. However, the impact of preprocessing on the fairness of the downstream model has neither been studied nor well-understood. In this project, we conduct a systematic study of how data quality issues and data preprocessing steps impact model fairness. Furthermore, we develop solutions for improving individual data preprocessing steps that would improve downstream model fairness.

Personnel

Students: Ekta Sathvika Kotha

Keywords: Data preprocessing, data quality, Explainable AI, model fairness