Rather than explaining algorithmic bias from the getgo, let us turn to an example whose prominence has arguably brought this issue to the forefront of ethical debates and discussions in AI.
The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) is an AI-powered decision-making algorithm that is still used today in the United States judicial system. The primary purpose of COMPAS is to evaluate the likelihood that a previous offender will offend again – such offenders are known as recidivists. The algorithm considers a variety of factors in its statistical judgments:
- Documented juvenile delinquency
- Drug use
- Criminal history and associates
- The Nature of past crimes
- Employment status
- Various other demographic traits.
Fortunately, this system has been subjected to significant external scrutiny, which has revealed some of the ways in which it actively manifests and enhances bias. While the algorithm correctly predicted the likelihood of recidivism at approximately the same rate (63% and 59% respectively) for both Black and White offenders, the errors it made during the classification process were driven by distinct factors.
For instance, COMPAS consistently classified Black offenders as higher risk than white offenders for both violent and non-violent recidivism. Risk classifications were generally skewed in favor of White offenders – these individuals received low-risk scores more than twice as often as Black offenders. Even when researchers controlled for the variables of age, criminal history, and gender, Black offenders were found to be disproportionately more likely to receive a high-risk score for violent and non-violent recidivism – 77% and 45% respectively, to be exact.
COMPAS presents us with a perfect example of algorithmic bias, especially because it was trained on large-scale datasets obtained in a country with the highest incarceration rate on earth – in other words, data in this domain in the United States is particularly plentiful. An earlier study examining risk classification in COMPAS found that skewed results could be explained by deeply engrained systematic bias within the U.S. justice system. This bias is likely a result of socio-cultural beliefs regarding the perception of Black citizens as intrinsically more prone to both violent and non-violent crime – this also explains the disproportionate rate of incarceration regarding Black vs. White individuals in relation to the overall population size in the United States.
To summarize, algorithmic bias occurs when a machine learning algorithm makes a set of flawed or incorrect assumptions that then lead to systematically biased classification outcomes.
How do we Reduce Algorithmic Bias?
In order to reduce algorithmic bias, we must recognize the three primary causes at the roots of this issue:
- Insufficient ability to recognize or establish statistical biases on behalf of the researchers designing the algorithm.
- The use of large-scale datasets that contain embedded flaws in their information structure (e.g. errors during data labeling procedures) or actively uphold prejudiced/stereotyped relationships between various data points.
- The use of datasets that do not holistically represent the data they contain – i.e. low quality or insufficient data.
Statistical bias is arguably the easiest of these three problems to remedy; said biases are already well documented throughout empirical research, effectively providing researchers with the knowledge they require to curb their effects during the research process. For instance, sampling bias occurs when researchers cultivate a sample population of study without using randomization techniques – this leads to a sample that does not represent the general population. Any subsequent data analysis conducted on this sample will yield skewed results that lack external validity even if the statistical methods themselves are neutral.
There are many forms of statistical bias ranging from selection and recall to confirmation and framing bias. Addressing each of these is beyond the scope of this article, however, the point I aim to make is that their existence is well-understood throughout the research community. Nonetheless, it is not always possible for researchers to conclusively eliminate statistical bias from their research practices, even if they aim to adhere to ethical guidelines. As such, one way to address these issues could involve implementing more stringent guidelines for research involving the use of sensitive information (e.g., credit score, financial aptitude, medical data, IP addresses, etc.) or proxy indicators (e.g., data that reveals through inference or correlation sensitive pieces of information). Subjecting this research to a more intensive peer review process could also aid in this endeavor.
On the other hand, addressing the use of large-scale datasets that reflect systematic inequality is a more difficult task. Such datasets, while they may not always be 100% error-free, are typically high quality. As an effect, the heart of this issue lies in socio-political and economic constructs, belief systems, or ideologies perpetuated throughout society that are then reflected in our data.
Let us return to the COMPAS example. Biased classifications, in this case, were a consequence of engrained societal stereotypes regarding a supposed increased likelihood of criminal behavior in Black communities – these stereotypes led to higher rates of incarceration among Black Americans, perpetuating the erroneous assumption that members of these communities are more prone to crime (this is a positive feedback loop). Moreover, this narrative has often been used to justify excessive or violent policing in the United States and has become a centerpiece of the Black Lives Matter movement. So, how can we address this issue?
One approach that is appealing involves the curation of datasets that control for a variety of variables that contribute to the manifestation of systematic bias. Such variables would include demographic information such as specific districts or area codes, healthcare and education accessibility, income, and credit ratings, to name a few. If we controlled for these factors during dataset curation, we might be able to cultivate a more even distribution across datasets.
An additional approach involves the implementation of widespread privately or state-sponsored educational programs that focus on revealing biases to working professionals as well as the general public. This ground-up approach would allow us to address the prevalence of bias at the societal scale and could lay the groundwork for legislative and judicial systems that place more emphasis on equity and equality.
Finally, the problem of insufficient or low-quality data often pertains to communities that are less privileged or increasingly vulnerable. If we want to ensure these communities are fairly represented in our datasets and analysis, as we should, then we need to increase our efforts to obtain meaningful data from them. This could be achieved through community outreach programs that actively engage said citizens and incentivize them with benefits in the form of accessibility to services they are entitled to but struggle to access. Importantly, we have to ensure that any community outreach programs we design are not coercive or exploitative in any way.
Which Industries are most Prone to this kind of Bias?
Unfortunately, as governments, businesses and organizations become more data-driven, the risk that they exhibit bias in their data analysis practices increases accordingly. The reality is that there is no existing industry immune to algorithmic bias, insofar as it uses machine learning technology in conjunction with Big Data.
That being said, there are two industries I believe must exercise extreme caution in this domain, primarily due to the kind of sensitive information they deal with.
The first of these two is healthcare services, specifically the domain of insurance. For instance, it is possible that by using data points that correlate with an increased risk of disease (e.g., age, race, gender, location), insurance companies could implement automation algorithms that adjust premiums accordingly – individuals that present a higher risk of developing serious health conditions could be required to pay more for their health insurance. Therefore, an increase in premiums might preclude certain members of a given community from accessing the medical treatment they require – automation algorithms could reflect systematic inequality if they do not control for the appropriate variables (the same point I made in the previous section).
The second industry is finance, specifically programs that evaluate individual creditworthiness. For example, in a 2021 report by Forbes, it was revealed that Black families, on average, earn 30% less than White families and possess a cumulative wealth that equals approximately one-eighth that of White households in the United States. Such statistical trends do not only reveal inequality, but have the potential to exacerbate it in processes involving algorithmic analysis, prediction, or classification. Creditworthiness, in simple terms, entails an individual’s ability to pay back debts in a timely and adequate manner – automation algorithms used in this sector could incorrectly presuppose that a Black individual is less capable of paying their debts than a White individual, even if they have the financial capacity to do so. The effects of a poor credit rating will then reverberate throughout the individual’s life, negatively affecting their abilities to acquire loans, suitable mortgage rates, certain insurance packages, and various other services.
How can we use AI-driven Technologies to Promote Fairness?
The first step we could take to ensure that AI-driven technologies actively work to promote fairness and equality involves dispelling the following myth: algorithms themselves are neutral, it is data that leads them to produce biased, stereotyped, or skewed outcomes. While this may be true in certain cases in which algorithms are dealing with datasets that represent objectively quantifiable metrics, this does not always hold true for datasets that quantify human constructs and behavior.
The behaviors we engage in at the societal level, namely the cultivation of belief systems and moral frameworks as well as political and social ideologies are typically the products of a complex web of biases. Or, more specifically, pre-conceived tendencies to react or think in a certain way. Since humans are still the main authors of algorithmic design (note: this may change in the future as algorithms designed for recursive self-improvement become more prevalent), it is likely that they often unknowingly embed elements of their own thought structures and cognitive computation processes into the algorithms they build. In other words, it is difficult to envision a world in which any algorithm built by a human is intrinsically neutral, especially if it aims to classify behavioral data.
Another way to increase the likelihood that AI promotes fairness is by building open-source algorithms – but, this is not enough. Open-source algorithmic design projects must be collaborative and diverse, effectively ensuring that all those who have a say in the design process are equally represented. We must increase efforts to include minority or underrepresented populations in these projects, and use their skills to build more comprehensive datasets that are representative of humanity as a whole, not just its most “elite” members.
In fact, the company Hugging Face recently produced the world’s largest collaborative open-source multi-language model. With over 176 billion parameters, this model beats out OpenAI’s GPT-3 in terms of scale and claims to produce equally impressive results. This project recruited researchers from all over the world, especially in regions that are severely unrepresented in the global data landscape. Moreover, the open-source nature of this project did not only extend to coding accessibility – but also provided detailed insight into the entire research process. Simply put, anyone who is interested in the model can scrutinize its design and development as well as any assumptions made during its creation.