03 Jul 2025

Finding Sex/Gender in FS Data

TL;DR• Sex has eight key attributes, some direct in data, but mostly inferred.
• Many attributes can only be derived, often inconsistently.
• Privacy and other protections can make sex a different type of challenge.

This is the third article in a series about finding protected attributes in data.

We need to manage 4 key discrimination categories in ensuring that our algorithms are fair. We explored Racial Discrimination in the previous article.

Another key category is Sex Discrimination. It contains eight distinct attributes: sex, gender identity, sexual orientation, intersex status, pregnancy, breastfeeding, marital status, and family or carer responsibilities.

As with Race, some of these attributes are easier to spot and manage than others. So, we ask the same question again: Can we really control for all eight attributes in terms of ensuring fair algorithms?

This article explores where each attribute might appear in our data and what we can do about it.

Gender Attributes in Data

Let's consider each of the 8 attributes we listed earlier, and identify where these might be found in structured and/or unstructured data.

We note that algorithms can discriminate without ever seeing these attributes directly. Things like employment history, spending patterns, product choices, and transaction timing can serve as proxies. So even if we don't collect gender data, we might still create gender bias.

1. Sex

Sex is assigned at birth. Gender, on the other hand, is either based on sex, or individually selected. Historically, sex and gender were interchangeable. This is evolving, but because it used to be treated the same, older data and older systems might still reflect them as the same.

Sex is typically captured in structured data through sex/gender fields, title fields (Mr/Ms), legal documents, and self-identification forms. Title fields are not always reliable; e.g., “Dr” doesn’t identify sex, but a model could infer one or the other sex based on its training data.

2. Gender Identity

Gender identity rarely appears in structured data. It might be captured in diversity surveys or customer service notes, but isn't typically collected systematically.

It could potentially be inferred from name changes, title changes, or communication preferences, but these inferences are highly unreliable and potentially discriminatory.

In general, gender identity is invisible in most datasets.

3. Sexual Orientation

Sexual orientation almost never appears directly in financial services data. It might be inferred from joint account holders, beneficiary relationships, or address sharing, but these inferences are problematic.

Two men sharing an address might be flatmates, brothers, or partners. Making assumptions about sexual orientation from such data creates discrimination risks.

This attribute is largely invisible in financial data.

4. Intersex Status

Intersex status is protected under Australian law but rarely visible in data. It might appear in medical insurance claims or specific identity documents, but most financial institutions wouldn't encounter this information.

When it does appear, it's typically in unstructured data like medical records or identity documents.

5. Pregnancy

Pregnancy can be inferred from various data sources. Health insurance claims, medical appointments, changes in spending patterns (baby products, medical visits), or parental leave requests all suggest pregnancy.

Third-party data from retailers and digital platforms can reveal pregnancy status through purchasing and browsing behaviour.

Pregnancy status can appear in both structured (insurance, leave records) and unstructured data (purchasing patterns).

6. Breastfeeding

Breastfeeding rarely appears directly in financial data but might be inferred from purchasing patterns, health insurance claims, or workplace accommodation requests.

Like pregnancy, this information might come through third-party data sources that track health and baby-related purchases.

7. Marital or Relationship Status

This commonly appears in structured data through account types (joint accounts), beneficiary nominations, emergency contacts, and address sharing. Application forms often collect this directly.

However, de facto relationships, same-sex partnerships, and relationship changes create complexity. Someone might be legally single but in a committed relationship, or recently separated but still sharing accounts.

Either way, this status can be directly identified or inferred.

8. Family or Carer Responsibilities

This might be derived from transaction patterns (school fees, childcare, aged care), flexible work arrangements, or parental leave records. Emergency contact lists and beneficiary nominations also suggest family relationships.

Third-party demographic data often includes household composition information that reveals potential caring responsibilities. Except for specific marketing tasks, carefully determined, we should generally avoid using such data in our loan approval, underwriting and claims algorithms.

What We Can Do

The answer to our question "Can we really control for all eight attributes in terms of ensuring fair algorithms?" is not that straightforward.

Unlike race, many gender-related attributes are either invisible in financial data or highly sensitive when they do appear. This creates different challenges around inference, assumption, and privacy.

In short, we can inspect our algorithmic systems, and:

Avoid inferring sex from indirect data – a bit more challenging than it sounds, given the various data points that can create such inferences
Question assumptions built into structured data like title fields
Check third-party data for sex-related predictions
Monitor for patterns that might inadvertently disadvantage either sex, for example test whether either direct or inferred data affects credit or insurance decisions.

Disclaimer: The information in this article does not constitute legal advice. It may not be relevant to your circumstances. It was written for specific algorithmic contexts within banks and insurance companies, may not apply to other contexts, and may not be relevant to other types of organisations.

Weekly Articles Get weekly emails in your inbox.