18 Dec 2024

Algorithmic System Reviews: Substantive vs. Controls Testing

TL;DR• Knowing the basics of substantive testing vs. controls testing can help you determine if the review will meet your needs.
• Substantive testing directly identifies errors or unfairness, while controls testing evaluates governance effectiveness. The results/conclusions are different.
• Understanding these differences can also help you anticipate the extent of your team's involvement during the review process.

If you are commissioning a review of your algorithmic system, you ordinarily won’t be concerned with precisely how the review will be conducted.

But a high-level understanding is important.

It can help you determine how robust the review will be and what level of comfort the result will provide.

It also helps work out, upfront, what will be needed from your team for the review.

One aspect to understand is whether the review will include controls testing, substantive testing, or both.

Each of these offers distinct benefits and limitations.

Sometimes you need just one of them. Often you need/want both.

The key difference lies in the objective of your review

If you want to identify errors or unfairness in your algorithmic system, you need a substantive test. For a direct conclusion about accuracy or fairness. This article details a (largely) substantive testing method for accuracy reviews.

If you want to assess the effectiveness of your governance processes, you need controls testing. This is often needed to fulfil compliance obligations, meet contractual expectations, or demonstrate adherence to standards.

Controls testing only:

won’t directly conclude on accuracy or fairness
can conclude on the controls in place to ensure accuracy or fairness.

This distinction is important.

Note: there is a way to extend the controls testing to provide a broader conclusion; but this is atypical. It needs the right types of controls, and the right level of depth in testing those controls.

Substantive testing only:

won't conclude on the controls in place
often won't involve evaluating processes, important building blocks to understand (and correct if needed)
can, but often won't, identify the root causes of issues (because controls/processes were not checked).

If you want both - a conclusion about integrity, and about the controls in place to ensure integrity - you need a combined testing approach. A review that tests controls and tests the models/algorithms and outputs.

Testing Results

The descriptions and examples here are high-level.

Substantive Testing Results

Substantive testing provides direct evidence by examining the algorithmic system’s outputs or results.

For example, in a credit decisioning algorithm review, substantive testing might reveal that 5% of declined applications for a specific demographic group were incorrectly assessed.

Controls Testing Results

Controls testing evaluates the governance mechanisms surrounding the algorithmic system.

In the same credit decisioning context, controls testing might uncover that credit scoring models were inconsistently updated throughout the year, potentially leading to incorrect assessments.

Key differences in the Results

In the substantive testing example, the result showed what has actually happened. Controls testing showed the potential for something to happen and to not be picked up when it does.

The substantive testing is a lookback. It does not consider controls, so it usually can’t be used to determine what might happen in future.

The controls testing can’t reliably reveal errors. But it can help determine how sustainably a process may operate in future (if controls continue to function in the same way).

To address these limitations, the two approaches can be combined.

Combined Testing Approach

Combining controls testing and substantive testing provides a more robust view of algorithm integrity. It covers both the operational framework (controls) and actual performance (outputs).

Consider an insurance company reviewing its claims processing algorithm:

Controls testing might involve assessing the data input validations, the change management procedures for algorithm updates, and the monitoring systems for detecting unusual claims decisions.
Substantive testing could include reviewing processed claims to verify accuracy, analysing the distribution of claim outcomes across different policyholder groups, and examining specific cases flagged as potential fraud.

By combining these approaches, the review offers assurance on both the effectiveness of governance mechanisms and the validity of individual claim decisions.

Know what to expect

You don’t need to know the exact details – technical knowledge about reviews is not necessary.

However, understanding the different approaches can help you anticipate what the review will cover and ensure it aligns with your objectives.

Disclaimer: The information in this article does not constitute legal advice. It may not be relevant to your circumstances. It was written for specific algorithmic contexts within banks and insurance companies, may not apply to other contexts, and may not be relevant to other types of organisations.