When using data, one of the main challenges faced by auditors is the volume of exceptions generated. How can we overcome this?
Traditional audit sampling typically involves evaluation of between 5 and 50 items; consequently, the number of exceptions doesn't fall out of that range.
However, when we use larger sets of data - across full populations - the number of exceptions produced can be high.
The sheer volume can mean:
There are various approaches to overcome this, e.g. progressively categorising results into specific buckets based on key characteristics, reviewing a sample of those, and then extrapolating the results of the reviewed sample to the remainder of the population of exceptions.
However, when the key characteristics can't be easily categorised e.g., when the exceptions are based on and include both structured and unstructured data - the traditional approach doesn't quite fit.
An alternate method that generally works well involves the use of machine learning techniques. A similar approach - but using different techniques - that can significantly reduce the number of false positives:
An example of how this is applied
Project
A home loan (mortgage) assurance review for a financial services organisation.
The organisation's Internal Audit function is relatively small- fewer than ten FTE- but progressive, punching significantly above its weight, and respected by stakeholders.
The team decided to take a data driven approach, opting to cover ALL accounts and transactions for just over one year.
Just over 800m records.
KNIME- an open source analytics platform - was used to analyse the data, across various data sets, including:
Because the CRM data was primarily free text, we used a set of natural language processing (NLP) techniques, providing a level of structure, and then blended the processed data with the other structured data sets.
With the data in a format that could be used easily, we performed several analyses.
This included:
Most of these don’t need much explanation. But why offset account links?
Let’s explain why we decided to do this and the challenges that we faced.
Situation
Some customers have multiple deposit and loan accounts.
Linking those accounts can save money, consolidating credit and debit balances. This is typically referred to as an offset mortgage. It is common in Australia and the United Kingdom, and different to the “All-in-one” in the United States.
It works something like this:
Lending rates are usually higher than deposit rates, so offsetting saves money.
This is popular within the industry, as the saving is not trivial. In the simple but common example above, the saving is almost 10%. The larger the deposit balance, the larger the difference.
But the linking can easily fail because most banking systems were not originally built to deal with this type of relationship between accounts. They generally work well with standalone home loans, or standalone deposit accounts. But combining them often means a patchy workaround with some scenarios that have not been envisaged or properly tested.
There has been a fair level of regulatory (and media) attention to such failures over the past few years, with hefty infringement penalties and costs. An example of this is the AUD12m that this bank had to pay to customers. There would have been separate associated costs e.g., relating to the calculations of the refunds.
So for our project, we decided to check whether offsets were established properly. This means identifying expected offset links and then comparing those to the actual links that were in operation.
If we expected to see an offset link (identified for example in a customer interaction or complaint), and the link hasn’t been established, we would then need to investigate this as an exception.
Problem
We found thousands of potential exceptions.
We expected that many of the exceptions would turn out to be false positives, but we couldn’t possibly investigate them all to find the real exceptions.
How can we find the needles in that haystack?
Traditional solution
The traditional approaches would typically look like one of these:
Why not opt for the traditional approaches?
Alternate solution
The software that we were using has strong predictive modelling capability. So, we decided to use it.
This is the process; it sounds complicated, but is not difficult to implement:
The result
More than 90% of the exceptions were eliminated (false positives), to produce a manageable set of a few hundred to investigate.
Model accuracy was approximately 70%. Such a process is rarely going to be 100% accurate but this is certainly better than random sampling alone. It can be defended.
Remember that this was achieved by a relatively small Internal Audit team.
Tools, approaches and techniques to improve the use of data within audit are now readily available. Are you using them?