In a previous article, we discussed algorithmic fairness, and how seemingly neutral data points can become proxies for protected attributes.
In this article, we'll explore a concrete example of a proxy used in insurance and banking algorithms.
The Universal Postal Union uses the generic term “postcode” to describe this addressing system, used by more than a hundred countries. Certain specifics, like how granular a postcode area is, may vary. The term used to denote postcode also varies slightly:
Most of this is due to language differences, or subtle differences in wording.
This article will use Australian terminology and data. But the concept will apply to most countries.
Using Australian Bureau of Statistics (ABS) Census data, this article aims to demonstrate how postcodes can serve as hidden proxies for gender, disability status and citizenship.
The overall gender ratio in Australia is about 50/50 [50.7% female, 49.3% male].
But individual postcodes can show surprising variations.
Consider the visual below, based on 2021 census data. Using 55% as the cut-off point (~10% above the average), we have 200 postcodes that are either >55% male or >55% female.
180 postcodes have a higher proportion of males, perhaps due to male-dominated industries.
20 postcodes show a higher proportion of females. This might be influenced by factors like retirement communities with longer female life expectancy.
The census collects data on core activity need for assistance, which serves as an indicator of disability. Analysis of this data reveals significant variations in disability ratios across postcodes.
The average is 6%. For individual postcodes, this can range from 0% all the way upto 59%.
While 59% is an outlier, there are many postcodes in the 10-15% range.
For e.g., 11% of people in 4655 (QLD) need core assistance vs 4% in 6164 (WA). Each have ~66k people.
These differences can be attributed to various factors, including:
Citizenship status is a protected attribute. It's illegal to discriminate against someone based on their citizenship status. It may also be a proxy for race and ethinicity, which are protected attributes.
Australian citizenship status varies significantly across postcodes. On average, 11% of the 2021 census respondents identified as non-citizens. More than half of all postcodes, with more than half of the country's population, had non-citizenship ratios that were much higher (>16%) or much lower (<6%) than the average. For example, as reflected in the visual below, in the postcode with the largest population - 3029 in Victoria - 25% of the population were not Australian citizens.
Postcodes with higher proportions of non-citizens might be characterized by:
The variations in disability, gender, and citizenship ratios highlight a critical issue in algorithm design.
If postcodes are used as input variables, they can inadvertently introduce biases related to these protected attributes.
For example:
To address these hidden biases:
By recognizing the complex information encoded in postcodes, we can work towards creating fairer, more equitable algorithms that serve all members of society, regardless of where they live.
Look beyond surface-level variables in data analysis and algorithm design.
As we strive for fairness and equity, we must remain vigilant about hidden proxies that exist within our datasets, including the humble postcode.
Disclaimer: The information in this article does not constitute legal advice. It may not be relevant to your circumstances. It may not be appropriate for high-risk use cases (e.g., as outlined in The Artificial Intelligence Act - Regulation (EU) 2024/1689, a.k.a. the EU AI Act). It was written for consideration in certain algorithmic contexts within banks and insurance companies, may not apply to other contexts, and may not be relevant to other types of organisations.