This is part 2 of a 6-part series.
In the first episode, we discussed the 5 most commonly cited data in audit challenges. In this episode, we discuss solutions to the 1st challenge – access to data.
The five items are:
- Access to data – you can’t get all the data, or you can’t get it quickly enough.
- Low value – the analysis doesn’t provide new insights.
- False positives – too many of them; results are overwhelmingly noisy, distracting your focus.
- Superficiality – the results are not deep enough to properly understand and refine the problems or to provide opportunities for improvement.
- Reporting – the results are not available in time for reporting, the report is not tailored to the audience.
A couple of weeks ago, we spoke about the five Assurance Analytics challenges that most auditors face. We said that this next episode would be about the solutions to those challenges. Now, we realised that going through all five of the solutions in one episode is going to make it a really, really long episode. So we decided instead to split it up. And what we’re gonna do is record each of the five shorter episodes for each of the challenges and then release them Monday through Friday in the week that we would normally release the one episode.
Yeah that sounds good. So, five individual episodes, one for each solution to the original five challenges.
That’s right. So we can start with, I guess what the five challenges are.
Yeah, just refresh my memory please Yusuf.
So the first one is access to data, so you can’t get all the data, or you can’t get it quickly enough. The second one is low value. The analysis doesn’t provide new insights. The third one is false positives. You either have too many of them or the results are overwhelmingly noisy. The fourth is superficiality. This is where your results are not deep enough. And the last one is timing and reporting. So that’s about results not being available in time for reporting or the reporting is not adequate. So today we’ll talk about access to data. So this is where you can’t get all the data or you can’t get the data fast enough to be able to execute on your audit. This is by far the most often cited challenge that auditors have, particularly internal auditors. And I know Conor in your world, in the performance audit world, there’s a similar issue that performance auditors face as well.
Exactly the same issue, yeah.
This is why this is number one. We want to get over the biggest challenge first, so we want to resolve that right? We want to be able to actually get access to the data so we can do some analytics. The first way to do that, and this is by far the easiest way, is to get access to the data warehouse or the data lake that your organisation already has. If you already have a data warehouse in place or you already have a data lake that is being built or has been built, then getting access to that is the fastest way to get access to the data. The majority of the data that you need.
And Yusuf, can you just remind me the difference between a data warehouse and data lake?
Think about a data warehouse as a warehouse and a data lake as a lake, really just drawing on the words that they use. So warehouse will have a range of inventory or stock that comes in and is labelled and put onto shelves or other ways in which you structure the way in which your different materials or you’re different products exist within the warehouse. The lake, on the other hand, would be some level of structure to it, so you know you know where the bottom is. You know what sort of material to expected at the bottom, but you don’t know exactly how it’s structured. You have some water in between, and you may have a few living objects, so similarly, a data warehouse would be reasonably well structured. The data that is coming into the data warehouse is known. So you know exactly what it is that you’re bringing in. You will extract data out of the various systems, transform them, and have a particularly well structured set of data in your data warehouse with documentation, et cetera. In terms of a data lake that would be a bit less structured. So it’s not completely unstructured, but a bit less structure, and you won’t necessarily be bringing your data in and transforming it in exactly the same way. So you’ll have some level of structure like you have in there in the data warehouse. But then you also have a whole bunch of other data in there that you haven’t transformed for analysis. So that’s that’s largely the difference. I mean, there’ll be variations of that, depending on how you’ve decided to undertake the build of the data warehouse or the build of a lake. But that broadly is the difference.
Right. So in terms of accessing data for our initial solution, where a data lake or a data warehouse is in place, that gives those organisations an advantage for their internal audit. Is that correct?
That’s right. Yes. So, because you can decide what it is that you need and go directly to it without having to put in data requests or ask other people to help you, you can go directly and get that data for yourself. Makes it a little bit easier. They are a couple of things that you need to consider in terms of accessing data that way. One is you need to know how the data is structured. You need to know what different data tables are and their fields, assumptions that have been in place, how the transformations actually work so that you’re getting the right data and not taking anything that you shouldn’t be taking or making assumptions around what the data actually represents. The other thing is that you need to understand how often the data is updated. If you need data that is a little bit more recent, you know the last few days or last couple of weeks, you need to know how often the data in those tables is updated because data and the data warehouse can be updated at various frequencies, various intervals. You may have some data that’s updated hourly, daily, weekly, monthly et cetera. If that data is updated or a particular table is updated monthly, for example, and you need to go in and get last month’s data. You need to know when exactly that is going to be updated so you can have a reasonable level of assurance that you’re actually getting the data that you need.
And so in the internal audit world Yusuf, how prevalent are these data lakes or data warehouses and how accessible to internal audit teams?
So we see access to data warehouses happen quite often, particularly in organisations who have been running their warehouses for awhile. And where there are dedicated analytics people within the internal audit teams. There are some smaller audit teams as well that have direct access to the warehouse or where there aren’t significant number off internal resources that can help get data and where the internal audit team has been using analytics for a while, those teams would then have direct access to the data. Most organisations nowadays will have some form of data warehouse because they’ve not been able to continue to report off their ERP systems they just need somewhere where there is this less load on the underlying system, and so they’ve created these warehouses that bring data together from multiple different places. And quite often you’ll see that internal audit will have some level of access to it, either all directly or via a member of the team or via somebody that they work with quite closely in the data or IT teams.
So where a warehouse is not in place, what are some of the ways in which the internal auditors go about accessing data and the problems that we’ve just described around getting that access? What are some of the other ways to get around those?
Even if you do have access to the warehouse, you may not have access to all of the data that you need because it may not exist in the warehouse. You’re quite often need to get data from outside, so you need some some open data that’s quite useful for both internal auditors and particularly for performance auditors. There’s also some systems that you know haven’t made their way into the data warehouse landscape or environment, if you like, and if you want to get data from there, you may need to take a slightly different approach. So the second solution to overcome the challenge of access to data is quite often when we start and audit , we decide what it is that we want, the data that we want up front, and we put together a nice large data request that then goes off to management. If we change that up a little bit and adopt a more iterative approach, we can get a quicker initial result and reduce the burden on management. So what that means is you start with a set of data that you actually need that you know you definitely need upfront and that might take a little bit longer to process or where you need to be able to understand that data first before you go into any more depth. The benefit of taking an iterative approach is that you don’t necessarily ask for data that you’re not going to need, and you also have some leeway to look for other data that you might need and then request that as well. That iterative approach we’re suggesting here helps ensure that you only get the data that you need and that you can actually get access to data that you haven’t identified upfront because quite often, when you’re going through your initial results, you’ll find that you need something else. You need a bit more depth in a particular area or you need a different data set to be able to give you a different view. If you put a large data request in up front and then you don’t use all of that data and then you ask for more data, it does become a little bit difficult to manage. So a better approach would be to have a discussion with management to explain to them how you going to be doing it and that you you’re going to be asking for more data as you go, but that you don’t want to ask for a whole bunch of data up front to help reduce the burden on them and make it easier for everyone.
And I’m assuming, as with most of these requests, we discuss as auditors, the more interaction you have over a period of time through various audits, with particular management about data requests, for example, the more comfortable they become with with meeting those requests and the fact that you may need to come back to them with further refinement.
Quite often, you know, you get a situation where you’ve requested, you know, 15 or 20 data sets. People ask why you need that, and you say it’s because either you know exactly why you need or, you know, I think I’m gonna need this data or I think I’m gonna need that data. If they have to go off and actually get that data for you and you don’t use it and they can see that you haven’t used it, that creates a little bit of a challenge in terms of the next data request that you going to put together, because then you’ll get more questions around why exactly do you need that? You know, you sent us on a bit of a wild goose chase the last time. So getting that communication upfront is really important.
And I’m guessing, too, though when they are extracting data for specific purposes, once they, the team understands the business team understands the purpose for which it’s being requested and how it’s going to be used, that may actually inform some of their work as well and how they perhaps look at their own data as well.
Yep, that happens quite often as well. And in the performance audit world, that happens as well, doesn’t it?
Yeah absolutely. That’s that’s, one of the I guess ancillary benefits from a performance audit. At times, it can be quite confrontational when you’ve got an external party, usually the Auditor General’s office coming in and making a request for data. But once you sit down and explain to the individual custodians within that entity, why has been requested and what has been requested for, sometimes that can actually lend some value in terms of what they can do with their own data. So they’re not just collectors of it, and custodians of it, they’re actually able to perform some other now analysis of it.
Within internal audit a similar situation will apply. Quite often, that happens later on when you know, the results have been produced and internal audit can explain what it is that they did and what data they got and what it is that they did with the data and how they got to the result that they got to.
In your experience, so we talked about the iterative process and how useful that is rather than sort of a big bang request, everything approach whereby you get a whole heap of data that may in fact not be all that useful or after initial analysis, you to determine ah well we actually need some other data as well. The obvious benefits to an iterative approach you’ve explained, but I’m just wondering Yusuf, is there any sort of rule of thumb or reasonable benchmark when you’re requesting data as to the initial request? Should be 30% of what you think you need or 50%? Or is it just a bit too nebulous to be able to have a sort of a rule of thumb around that?
It will vary from audit to audit, just depends on exactly what it is that you’re doing and how well you know the subject matter and how much you need to delve into the details to be able to understand what more you need.
So that really underscores having those good communications at the outset, with the management team and the data providers to understand that you will be coming back to seek more.
Yes, that’s right. It’s really difficult to always determine upfront exactly what you need and this leads into the next potential partial solution is, if you’re able to request data really early on, up to three months before the audit is scheduled to commence, you then have a good buffer to be able to get that initial data and explore it and then go into more detail on any additional data that you might need or go into a data request for any additional data that you might need. Because you don’t want to be in a situation where you have a short audit time frame and you’re requesting data at the point of the audit commencing or even after. Because by then it might be a little bit too late to be asking for data.
I agree. The only other problem that that can lead to is what’s the actual definition of audit commencement? Now, I’ve heard of people suggesting the performance audit world in the past that well, from you make that initial data request. That’s the commencement of your audit. Others would suggest that no, it’s not until we actually start conducting field work within your entity that that’s the commencement of the audit and any of the data we’ve requested prior to then is really just helping us understand your environment and your performance and and how your entity operates. So I think there’s still a little bit of work to do around that consistent definition around the access to data to help plan for the upcoming audit.
Regardless of what your definition of what commencement of an audit is, if you have a particular timeframe for reporting. If you know that you want to report on a particular subject area by particular date, for example, or in in time for particular audit committee or Parliament sitting, then what you want to do is start requesting your data for that audit as early as possible, and obviously you’re not going to do it, you know a year in advance, because that’s way, way, way too early. But in some cases you may decide that it’s part of your audit planning you request data a little bit early and then decide whether that audit topic still remains on the agenda, because if you can see in the data that there isn’t anything much going on, if you’re able to get to that, that sort of answer, you might be able to re prioritize your activity.
Yeah that sounds like a really prudent approach, for I think both internal audit and performance audit.
Sure. And then the other thing, the last thing in terms of you know, solutions for access to data is sharing lessons among the team. If you’re using data for a particular audit and you have another audit coming up that might benefit from the use of that same data or even is in the same similar sort of topic area, you want to share a few things among the team. One is you want to share the insights from the analysis that was conducted. You also want to share the understanding of the data that was received, any limitations, how that data was treated, what the data does have in terms of fields that are captured and the granularity of those fields, the history that is captured within the data and then also you want to share the actual data so that, you know, the next team can can use that. We spoke about why you should be sharing data and how you should be sharing data. And in the previous episode, and going along with that also is if you determine that there’s use for data. So, you know, maybe the data isn’t exactly the same subject matter as another audit, but you might be able to use that data to shed light on particular matters that may relate to the next audit subject. Then it would be useful to start with that data. So to supplement that audit work that will happen later. The challenge is that you don’t want to be using all the data that you have for everything that you’re going to be doing. I mean, that’s not gonna make much sense. But the reuse of data both for similar subject matter or for the same subject matter, would definitely help in terms of getting access to that data for a number of reasons. One is, if you’ve already got the data and you can use it again, then you don’t need to go back and ask. And the other is, if you’ve got that data and you’ve done something useful with it, and you were able to identify exactly what it does have, what it doesn’t have you then able to tailor your new data request to suit what the data actually is. You know, reflecting that you are learning from what you’re doing from your previous audits and not, you know, that that information is not lost then that knowledge is not lost for future audits.
I’d agree from the performance audit side of the house. With that certainly being a positive, whereby you can say to management: we’ve already got this data from your people. We’ve taken that on board, and we’ve tailored this new request specifically, so we’re not retreading old ground on. And the other thing is, I would say that an observation from some of the more progressive performance audit practices around is that now Yusuf they’re trying to actually formulate specific data deliverables as a part of their performance audits, and it’s actually a step within their process where whereby they try to say OK, these are the data were trying to obtain. And what are some of the possible deliverables even beyond the audit report that we can create using this data? So I’m going back to the original problem, which was the inability to access data. So if, from a previous audit where you have a data deliverable that we can reflect on. Then again, as you suggested, that can help us tailor any future requests to try and minimise the impact on entity resources.
A few ideas to help improve the way in which your access to data for assurance purposes can be improved. This is one of the larger areas, like we said, in terms of larger or commonly cited challenges that we come across. Four ideas there. The first is accessing data directly via the data warehouse. The second is trying to use a more iterative approach, as opposed to asking for all of the data up front. The third is think about timing. So, ask for data or start getting hold of data quite early on. And the last one is share insights among the team. Share lessons among the team share data among the team so that you actually start off in a better situation, better position the next time we need to get data.
All good common sense advice that can only improve the efficiency and effectiveness of how we get access to data.
All right, so tomorrow we’ll talk about low value.
Sounds good Yusuf, talk to you then.