Finding Age in FS Data

Written by Yusuf Moolla | 16 Jul 2025

TL;DR

• Unlike other discrimination categories, Age has/is a single attribute.

• It's often visible in data, but can also be inferred or hidden.

• Age can be derived in different ways, sometimes inconsistently.

This is the fifth article in a series about finding protected attributes in data.

We need to manage 4 key discrimination categories in ensuring that our algorithms are fair. We previously explored Race, Sex/Gender, and Disability.

Another key category is Age. Unlike the other categories, age is a single attribute.

This article explores where age might appear in our data and what we can do about it.

Age in Data

Legislation in many jurisdictions make it unlawful to discriminate against a person because of their age, unless an exception applies (such as age-based eligibility for certain products or services).

Age is as simple as it sounds. It refers to how old a person is. Algorithmic systems can discriminate by age even if they never see the age field directly. For example, using proxies like years since graduation.

So, even though age is a simple concept, it can show up in data in many ways:

Directly captured

Age is often collected as date of birth on forms, applications, identity documents or driver’s licenses. This is straightforward, and most banking and insurance systems will hold this data. For both existing customers and prospective customers, we often need this for KYC purposes.

Inferred

Age can be estimated, or age buckets can be inferred, from other sources. For example:

school or university graduation dates
product eligibility (e.g., “over 55s” accounts or "youth accounts")
life events (e.g., retirement or first home purchases)
the type of driver's license held (e.g., "provisional" driver's licenses often correlate with a certain age bucket).

Derived

Systems can calculate age from other data, e.g., subtracting year of birth from the current year. Again, this is straightforward and fairly common. But it can introduce inconsistencies, for instance where data is missing. This can affect algorithms: how we handle the anomalies can affect decisions.

What We Can Do

Age is often visible, but sometimes it’s inferred or hidden.

In practice, we can inspect our algorithmic systems, and:

Avoid inferring age from indirect data, such as transaction history or product choices, unless there’s clear (legal and ethical) business rationale.
Question assumptions built into structured data, such as age-based eligibility or marketing segments.
Check third-party data for age-related predictions, especially where demographic data is used.
Monitor for patterns that might disadvantage people based on age, such as higher insurance premiums or reduced access to services.

Disclaimer: The information in this article does not constitute legal advice. It may not be relevant to your circumstances. It was written for specific algorithmic contexts within banks and insurance companies, may not apply to other contexts, and may not be relevant to other types of organisations.

View full post