This is part 4 of a 6-part series.
In the first episode, we discussed the 5 most commonly cited data in audit challenges. In this episode, we discuss solutions to the 3rd challenge – false positives.
The five items are:
- Access to data – you can’t get all the data, or you can’t get it quickly enough.
- Low value – the analysis doesn’t provide new insights.
- False positives – too many of them; results are overwhelmingly noisy, distracting your focus.
- Superficiality – the results are not deep enough to properly understand and refine the problems or to provide opportunities for improvement.
- Reporting – the results are not available in time for reporting, the report is not tailored to the audience.
Today we’re talking about false positives. A couple of days ago we spoke about the five assurance Analytics challenges, and we said that we’re going to go through each one of those and the solutions to those challenges individually. So we spoke about number one, which was access to data you spoke about. Number two, which was low value in the next couple episodes we’ll talk about numbers four and five, which are superficiality, and in timing and reporting for today we’ll focus on false positives and how to overcome that. Just to clarify use of the low value episode was relating to lower value insights as opposed to low value assurance. Yes, in terms of false positives. And this is this is a challenge that internal audit teams face and performance audit teams will face if you facing the false positives problem. First thing is, you’re not alone. There’s a lot of auditors that find situations where there’s just too many results. They don’t know what to do with the sheer volume off results that has bean generated.
It can be quite overwhelming. Initially, if you’re new to it using data as part of your testing regime, I suspect that, you know, a lot of people can feel quite swamped.
This is one of the reasons that prevent people from using data because they’re just scared of what it is that they’re going to see if they’ve been burnt before with just a large number of results and not knowing what to do with it. And then they think I’m not going to use data again because I don’t want to get in that situation. Overcoming it is important. It’s not very straightforward, but it’s definitely achievable.
And in these type of circumstances, Yusuf it’s often useful to have, say, a mentor or somebody more experienced coaching you through these early stages off your use of data. Would you agree with that?
Yeah, I found that quite useful both on the receiving side and on the giving side. I know early on in my career there were people that that I’d work within my team and we bounce ideas off each other more lately, it’s being giving back to guys that I work with in our own teams or with clients that we deal with. But the discussions around this usually get you to a better place. It’s It’s usually not that easy to get to solutions if you just try to think about it in your own head, so having discussions and sounding boards and working through the challenge with like minded people can make this a lot easier. With traditional audit sampling, you would evaluate a small sample. They usually don’t go beyond 50 or 100 items that you’re sampling at any given time, but you can get when you’re doing analytics across full populations. You can get to thousands of results, and a lot of those could actually be false positives. There are different ways in which you can overcome this, and it all depends on the nature of the data and the nature of the subject matter that you have. The easiest one is if your data set is reasonably narrow and by narrow, I mean you don’t have a large set of fields, and it’s also very well structured, so by structure I mean, you don’t have lots of free text fields. The data ranges are finite values that fall within particular ranges and you have predefined categories off other fields. If you have that sort of narrow, well structured, it’s reasonably easy. So what you need to do, then, is you need to categorise your results and split them into specific buckets, and what you would do is you would bucket them based on certain key characteristics, and then you would review a sample of those, and it needs to be a representative sample, and then you extrapolate the results of the reviewed sample to the remainder off your exception population. That’s where you have a narrow, well structured data set.
If you have a situation where your data is either very broad, as in not narrow, as in, you have lots of data fields or you have unstructured data, and the unstructured data doesn’t seem to follow any well defined pattern. So usually that’s free text things like call centre recordings or notes that are captured by a call centre agents. In those situations, it can be difficult, too, categorise your results into buckets because you don’t necessarily know what the key characteristics are if you have a small like I said in the narrow case, if you have a small set off fields, you able to then work out what those key characteristics actually are. To be able to split your data up and categorise it. If you have very broad or unstructured data there, you then in a situation where it’s a lot more difficult to just visually categorise your data into buckets. And so in this case, we will need to follow up. Bit more of an advanced Approach the approach we use is machine learning techniques. So there’s a blog article in our website around reducing noise and false positives. And there’s an example there that explains how this is applied
In brief, what we do is we review a subset off the data, so we still categorise some of the data using some of the fields that we’re able to categorise based on to get to a representative sample. And then what we do is we use supervised machine learning to get to evaluate that smaller example, label them and then use that labelling to extrapolate the results onto a larger population. But in this case, you’re not extrapolating it manually, extrapolating it, using an automated technique.
The software exists. It’s it’s really not difficult to do. Quite often, you if you haven’t done something like this before, it’s quite easy to reach out to somebody who has and can help you work through what that might look like. But it’s not something that we have to wait for into the future. Those techniques are available today for us to use in their available in open source software for us to be able to use.
Is there an example that you could give us that might speak to this approach? This example, again, is in that article that I spoke about earlier thee situation was that we had an assurance reviews. So this was a home loan insurance review for medium sized Financial Services organisation with a small internal audit team. So really, this is, you know, it’s internal or a team that fewer than 10 people, right? So we’re not talking about hundreds of people in the bank, But even in the medium size bank, your transaction data sets can become quite big. In this case, we were looking for a particular situation where we knew that there were challenges with being able to, I’m summarising this, but challenges in being able to meet the needs of a particular type of customer and that customer had specific requirements or those customers had specific requirements. Because we were relying on call centre data for the interactions with customers to be able to understand which customers were affected and which customers were not affected. That then became a large set of unstructured free text data. So this was called centre recordings and that were converted to text people in the call centre that were tapping notes up. And so when we got to our set of potential exceptions, we had over 5000 exceptions. We said, What we gonna do with this? 5000 exceptions. Just ridiculous, right? You can’t possibly manual review that it will take months. We don’t as internal audit We don’t have the time. In fact, not months. It would take about a year to evaluate. All of that is what we worked it out so one person for almost a year to evaluate them manually.
So 5000 exceptions out of how many records?
There’s about half a million customers. All of the data we were looking at was roughly 800 million records wasn’t massive. You know, 5000 of 500,000. That’s 1% of the population, but it’s still 5000. It’s still quite big.
So relatively small, but we want to make sure that, you know, we’re getting to the specifics where we actually need to do something, or we actually need to identify a potential change. We knew that the answer wasn’t 5000 because just looking at the first few of them, we saw that some of them were, you know, real and some of them were false positives. What we did in that case was that we used some fields that had to get to a reasonably representative sample and then including, for example, the length of the free text data that we had. So you can’t always tell what’s in the free text data, and you can use various sort of techniques to try to get to that. But the easiest one that we used was how long was the free text? Data doesn’t contain certain keywords, so we got that we then identified subset representative sample and we evaluated those manually. So we took about 50 to 100 of them, evaluated them manually, got a label. The label was either true positives, false positive. And then we used that with a open source piece of software, putting the data in the label data in creating a model. So what the machine does is it creates a model. So you put together a workflow that uses machine learning techniques, and you create a model. And that model represents the way in which to identify true positives and negatives. So the way in which to identify labels in the remaining data sets and then you use that same model output and you apply that to the remainder of the population that isn’t labelled to create that label.
So the machine, basically what it does, is it says When you were looking at this record manually, what is it that made you select true positive or false positive? And then it tries to work out what the different weights of the different fields that it needs to use are, and what the potential values in those fields might represent, and also what the words in the free text might mean in terms of getting to a result, and then it provides you with the result.
We in this situation, we were able to get to a significant reduction in false positives. We dropped down from 5000 to just under 500 so we eliminated 90% of the exceptions, and 500 is a lot easier to deal with than 5000. Still a lot of numbers. But by using a simple technique, we able to eliminate 90% of the exceptions. That’s a simple example.
The issue of audit teams finding lots of false positives is so pervasive that it’s probably a good topic for the annual training that every auditor has available to them, those with a particular interest in using data better and perhaps dipping their toe in the water of supervised machine learning it might be something that you can they can raise with their assurance leaders as a topic for their training.
Often that would come up if particularly if they’ve been seeing something like that come up. If you know if you haven’t you haven’t seen it before. There’s usually no burning platform for it. The challenge is that if you come up against it and you haven’t prepared for it. You may be in a situation where you just don’t have enough time to work out what it is, or you may not have the imagination around getting to that answer. So as long as you know about the fact that this potential for this to be resolved and when you see it you identify that there’s a situation here that you know you need to reach out to somebody for help with, then you can actually get that help at that time and then work out. Is this something that we need to be able to train ourselves to do?
So Yusuf you’ve mentioned a few times there the use of open source tools to do some of this analysis and obviously we prefer to use KNIME. Can you just expand, which is open source obviously, can just expand on some of the attributes of KNIME and why you found it so useful.
Obviously, open source means that we could use it with absolutely no limitations on data size or functionality off the software to get to a reasonably quick result. It’s graphic user interface, so we’re not talking about trying to code in Python or R and I know there’s lots of R and Python programmers out there potentially disagree with me. But the reality is that if you want to learn how to conduct analytics, you don’t necessarily need to learn how to code. It’s useful to know how to code. There’s nothing wrong with learning how to do that. But if you really want to focus on providing value out of analytics, then you need to use the newer technologies that are available that are easier to use than trying to code everything by hand from scratch.
The platform is graphic user interface point and click. You configure your nodes and away you go. And if you want to code, you can code so you can incorporate Python or R or Java or SQL or anything else in there. And because the platform was built largely for advanced analytics, it just made sense to use it on this project. And for this particular problem that we had.
There’s a lot of different software packages that we can use, we use KNIME. It just makes sense. They are obviously others that you can use. But if you’re using some of the more traditional analytics tools and I will suggest thinking a little bit differently about that because you will be stuck with false positives and you will be stuck with an inability to get to a solution.
In a nutshell. False positives. You can do a couple of things depending on the data that you have. If you have a very narrow data and it’s pretty structured, just take a standard, straightforward profiling approach. If you have a different situation, say, if you have broader data or if you have unstructured data than you may need to adopt a technique that’s slightly more advanced but definitely something that you can do yourself so false positives be gone. We now have solutions to deal with them. And if you’re stuck with this, reach out to somebody that knows you can reach out to one of us.
We don’t think that there’s any reason to be not conducting analytics just because you think there’s gonna be too many results coming out to the end. So that’s challenge. Three Solved.
Perfect. People just need to get over the fear factor if they’ve got that initial rush of false positives.