Annual CERIAS Security Symposium

Fairness Debugging of Tree-based Models using Machine Unlearning

PDF

Primary Investigator: Romila Pradhan

Project Members

Tanmay Surve, Dr. Romila Pradhan

Abstract

Machine learning (ML) is fast becoming the standard choice for data science applications that involve automated decision-making in sensitive domains such as finance, healthcare, crime prevention, and justice management. Designed carefully, ML-based systems have the potential to eliminate the undesirable aspects of human decision-making such as biased judgments. However, concern continues to mount that these systems reinforce systemic biases and discrimination often reflected in their training data. Tree-based machine learning models, such as decision trees and random forests, are one of the most widely used machine learning models primarily because of their predictive power in supervised learning tasks and ease of interpretation. Given their overwhelming success for most tasks, it is of interest to identify root causes of unexpected and discriminatory behavior of tree-based models. However, there has not been much work on understanding and debugging tree-based classifiers in the context of fairness. We introduce an algorithm which identifies the top-k data points or patterns in training dataset that are responsible for model bias. One of the main parts of our algorithm is to utilize the recent advances in machine unlearning research. Using techniques from machine unlearning, our algorithm can find responsible data points or patterns in the training dataset which are responsible for inducing fairness-based bias on the predictions of testing dataset by the model in a time which is much faster than naively retraining the models.

Fuzzy Logic to the Rescue: Cracking the Code on Grooming Stages' Fuzziness!

PDF

Primary Investigator: Tatiana Ringenberg

Project Members

Siva Sahitya Simhadri

Abstract

Online grooming refers to the practice where an adult builds a relationship with a child or young person with the intention of exploiting them for sexual purposes. The number of internet grooming offenses reported to the police is growing and has increased by more than 80% in the last four years. There are five stages of online grooming via which offenders groom children online. Having a robust approach to detect and intervene in such conversations in the earlier stages is the need of the hour. Grooming chats have always been characterized as crisp sets until now (i.e., each chatline belonging to only one of the 5 stages). The primary objective of this work is to deviate from the conventional method and represent the grooming stages using the fuzzy membership function. We propose a framework to classify predator conversations into different grooming stages. The dataset used for this task was annotated by 2 annotators with over 80% reliability.