Big Data Ethics: detecting bias in data collection, algorithmic discrimination and “informed refusal”

Research Areas: Policy, Law and Management

Principal Investigator: Chris Clifton

We are increasingly seeing evidence of discriminatory outcomes from data-based decision making. Yet whether due to lack of a nuanced understanding about how big data is collected and how algorithms work, or due to the lack of transparency on the part of data producers and aggregators (or both) the ability for civil society to meaningfully engage in the governance of data collection and use is severely limited. Much debate has occurred in the realm of privacy protection and the ways in which machine learning can reproduce existing bias, from minor but insidious events such as the disparity in web advertising based on gender or race (Sweeney 2013; Datta 2015), imposing additional barriers to access on groups that do not “match the norm” such as Facebook requiring proof of identity for Native Americans (Sanburn 2015), to obvious life-changing outcomes such as predictive models for sentencing guidelines (Angwin 2016). However, there has been less emphasis on the ways in which emerging data collection methods and machine learning on large volumes of data themselves can introduce novel forms of impact on individuals’ lives, heightening or introducing new avenues for institutional discrimination. There has also been little attention paid to the difference between legal constraints around individual privacy and the potential harm from aggregate impacts on broader society in the era of data-driven decision making. What is clear is that such data-based decisions at scale have the potential to discriminate in unintended and difficult to detect ways, further blocking efforts to achieve an equitable and just society.

We are addressing these grand challenges through a multidisciplinary study of the ethical issues involved in the use of big data and predictive algorithms to make decisions affecting individuals. We will assemble a concrete set of cases, and use these to define the more general problem or problems that arise. Some of these cases will come from existing studies in data-driven discrimination (Sweeney 2013; Datta 2015; Angwin 2016). Others will involve historic discrimination data. We will also look at public data such as the NIJ Crime Forecasting Challenge (NIJ 2016), and public social media, where differences between groups may include distinctions based on personal preference as well as distinctions based on group stereotyping.


Other PIs: Kendall Roark Daniel Kelly

Keywords: Bias, machine learning, Privacy

Coming Up!

Our annual security symposium will take place on April 7th and 8th, 2020.
Purdue University, West Lafayette, IN

More Information