Postcodes: Hidden Proxies for Protected Attributes

Written by Yusuf Moolla | 24 Sep 2024

TL;DR

• Postcodes can inadvertently serve as proxies for protected attributes like gender, disability status, and citizenship.

• Significant variations in these attributes across postcodes can lead to unintended bias in algorithmic decision-making.

• Mitigate risks by considering the correlations, using alternative identifiers, and implementing fairness constraints.

In a previous article, we discussed algorithmic fairness, and how seemingly neutral data points can become proxies for protected attributes.

In this article, we'll explore a concrete example of a proxy used in insurance and banking algorithms.

Podcast icon Listen to the audio (human) version of this article - Episode 7 of Algorithm Integrity Matters

Postcode

The Universal Postal Union uses the generic term “postcode” to describe this addressing system, used by more than a hundred countries. Certain specifics, like how granular a postcode area is, may vary. The term used to denote postcode also varies slightly:

Postcode: Australia, Malaysia, Netherlands, New Zealand, UK.
Postal code: Canada, South Africa, Singapore
Postleitzahl: Switzerland, Germany
Postnummer: Sweden, Denmark, Norway
ZIP code: U.S.
Eircode: Ireland
Kode Pos: Indonesia
Code Postal: France
Código Postal: Spain
Código de Endereçamento Postal: Brazil
Yūbin Bangō: Japan
Postal Index Number: India

Most of this is due to language differences, or subtle differences in wording.

This article will use Australian terminology and data. But the concept will apply to most countries.

Using Australian Bureau of Statistics (ABS) Census data, this article aims to demonstrate how postcodes can serve as hidden proxies for gender, disability status and citizenship.

1. Gender Ratios: Not as Uniform as You Might Think

The overall gender ratio in Australia is about 50/50 [50.7% female, 49.3% male].

But individual postcodes can show surprising variations.

Consider the visual below, based on 2021 census data. Using 55% as the cut-off point (~10% above the average), we have 200 postcodes that are either >55% male or >55% female.

180 postcodes have a higher proportion of males, perhaps due to male-dominated industries.

20 postcodes show a higher proportion of females. This might be influenced by factors like retirement communities with longer female life expectancy.

2. Disability Ratios Across Postcodes

The census collects data on core activity need for assistance, which serves as an indicator of disability. Analysis of this data reveals significant variations in disability ratios across postcodes.

The average is 6%. For individual postcodes, this can range from 0% all the way upto 59%.

While 59% is an outlier, there are many postcodes in the 10-15% range.

For e.g., 11% of people in 4655 (QLD) need core assistance vs 4% in 6164 (WA). Each have ~66k people.

These differences can be attributed to various factors, including:

Proximity to specialized healthcare facilities
Availability of accessible housing
Socioeconomic factors influencing health outcomes

3. Citizenship Status: another protected attribute

Citizenship status is a protected attribute. It's illegal to discriminate against someone based on their citizenship status. It may also be a proxy for race and ethinicity, which are protected attributes.

Australian citizenship status varies significantly across postcodes. On average, 11% of the 2021 census respondents identified as non-citizens. More than half of all postcodes, with more than half of the country's population, had non-citizenship ratios that were much higher (>16%) or much lower (<6%) than the average. For example, as reflected in the visual below, in the postcode with the largest population - 3029 in Victoria - 25% of the population were not Australian citizens.

Postcodes with higher proportions of non-citizens might be characterized by:

Proximity to universities attracting international students
Areas with seasonal worker programs
Suburbs popular among expatriate communities

In contrast, postcodes with higher citizen ratios might reflect:

Established suburban areas with multi-generational Australian families
Regions with fewer employment opportunities for migrants

Implications for Algorithmic Fairness

The variations in disability, gender, and citizenship ratios highlight a critical issue in algorithm design.
If postcodes are used as input variables, they can inadvertently introduce biases related to these protected attributes.

For example:

A lending algorithm using postcode data might unfairly disadvantage applicants from areas with higher disability ratios.
A hiring algorithm could perpetuate gender imbalances by favouring candidates from postcodes with specific gender ratios.
An insurance pricing algorithm that uses postcode data might be discriminating - illegally - against immigrants.

Mitigating Postcode Bias

To address these hidden biases:

Be aware of the potential for postcodes to act as proxies for protected attributes.
Conduct thorough analyses to identify correlations between postcodes and sensitive variables.
Consider using more granular or alternative geographic identifiers when appropriate.
Implement fairness constraints that account for postcode-based variations in protected attributes.
Regularly review algorithms for unintended biases introduced by geographic data.

By recognizing the complex information encoded in postcodes, we can work towards creating fairer, more equitable algorithms that serve all members of society, regardless of where they live.

Look beyond surface-level variables in data analysis and algorithm design.

As we strive for fairness and equity, we must remain vigilant about hidden proxies that exist within our datasets, including the humble postcode.

Disclaimer: The information in this article does not constitute legal advice. It may not be relevant to your circumstances. It may not be appropriate for high-risk use cases (e.g., as outlined in The Artificial Intelligence Act - Regulation (EU) 2024/1689, a.k.a. the EU AI Act). It was written for consideration in certain algorithmic contexts within banks and insurance companies, may not apply to other contexts, and may not be relevant to other types of organisations.

View full post