Episode 38 | Why We Should Spend Time Preparing Our Data and Checking Quality

The Assurance Show
The Assurance Show
Episode 38 | Why We Should Spend Time Preparing Our Data and Checking Quality
/

 

Show Notes

In this episode, we discuss 5 major benefits of investing our time in preparing our data and checking its quality.

These include, in reverse order of importance:

1. Exploration during data prep to understand the data

2. Ensure the data is in the right format for analysis

3. Save time by avoiding incorrect conclusions

4. Confidence in the analysis and result

5. Identify important data related improvement opportunities

 

Transcript

Narrator: 

You’re listening to The Assurance Show. The podcast for performance auditors and internal auditors that focuses on data and risk. Your hosts are Conor McGarrity and Yusuf Moolla.

Yusuf: 

Today, we are going to be talking about why we should spend time preparing our data and checking the quality of that data before we jump into any analysis. And there’s a couple of things that we need to talk about beforehand, and then we’ll go into five reasons for needing to do that. But when we think about preparation of data and quality checking, Conor, what’s the first thing that comes to mind for you?

Conor: 

The first thing that comes to mind to me is it sounds difficult. And the second thing is, I know we don’t spend enough time focused on making sure that we’ve got clean data of sufficient quality. And that’s a generalization that we need to spend more time focused on getting clean enough data to be able to do the analysis on it that we want to do and make sure that it’s good enough quality to be able to be relied upon.

Yusuf: 

One thing is that we don’t need to clean everything. We only really need to clean what we’re going to be using, and that is largely driven by the objective that we’ve identified and the hypotheses that we’ve selected. You can easily spend half of your time cleaning data on an audit, and so if we can reduce that, that’ll be excellent, so if we can try to focus on those high level items. But we still need to have a reasonable level of focus on that. And if you do spend half of your time doing it, that doesn’t mean that you’re necessarily doing something wrong, because in many audit projects, we can spend that sort of time preparing our data and checking that quality.

Conor: 

Yes and it is really rare. In fact, in our experience you actually get data that’s of sufficient quality at the outset without cleaning it to be able to be used for your objective.

Yusuf: 

That’s right. In our experience, to be honest, I don’t think I’ve ever had a data project where I’ve not had to do some level of cleansing in the first place. Even if you’re not cleaning data, even if you have clean data, you may have data that’s in different formats, different date formats or particular fields where you need to split them up, et cetera. And so you still need to do some level of preparation. It may be that your data is clean but in order to be able to use it, you still need to do some level of preparation.

Conor: 

Yes. So you’d have an appropriate level of skepticism if somebody came to you with data and said: It’s all ready to go, here you go, it’s wrapped up in a nice present. Away you auditors go and do your analysis over it.

Yusuf: 

That’s right. The data that you see that is like that, is usually the data that you’d get in a training exercise, where the data’s been specifically designed for that training exercise. The anomalies have been preselected, the specific fields have been preselected, etc. And so you can’t really do anything with that, because it’s normally dummy data anyway. In the real world, in any practical matter, when you move out of the training realm, the formal classroom style training realm, then you’re going to get data that isn’t clean. So there’s five reasons that we are going to go through today for why you should spend time preparing your data and checking the quality. And how important these are, will vary from what audit team to audit team, depending on what it is that you are involved in. But they all will have some level of relevance for everyone. We’re going to go through these five in reverse order of importance. So you have to wait to the end to get the most important one, but please bear with us because they’re all quite interesting and relevant. So the first one is that the data process often involves exploration. Often when we getting into our data for the first time, we want to be able to explore that data. And so if we combine some of that exploration at a technical level with the data prep, that helps us understand the data before we analyze it. So we have to do that anyway, we have to actually go in and understand what the data is. I know that there’s a small percentage of situations where you have a small data set or the data is really clean, and you can just drop it into your favorite data visualization tool and it’ll give you the answers that you want. But that is extremely rare, and there’s this promise that’s made by many visualization companies. That all you need to do is drop your data in and we will give you all the answers. And that doesn’t really work in most cases that we’ve seen. So you have to do something with it. You’ve got to, at the minimum, understand what the data is that you’re looking at, and then make sure that you’ve done something to prepare the data so that when you actually do that analysis, you don’t end up short. So that exploration that you do, that technical exploration that you do upfront; understanding what the data looks like, that there can form part of your data preparation and it definitely helps. Like I said, helps you understand that data before you analyze it. So that’s reason number one.

Conor: 

Okay. So that’s to help us explore what we actually have and understand the data before we analyze it. What’s the second reason ? Yusuf: So the next thing set aside time to actually prepare our data, that helps us ensure that the data has the right consistent format. So for example, if you have multiple data sets, you want to ensure that the joins that you’re going to be putting together are of a reasonable level of technical quality. And what that means is you understand and you’ve been able to prepare your data so that you don’t lose any records when you do any joining. Quite often, when you try to join two datasets together, either losing records or you creating a data set isn’t representative of the initial set. So what are the fields that you need to be joining on? What is it that’s common between your various datasets, and how many of them might be missing from one that are in the other, et cetera? What that does is it helps you ensure that you’re not coming to a conclusion that is based on an invalid combination, And a lot of what we do nowadays is joining data up, so trying to bring data together from different domains. A more practical example that may be would be where we try to compare dates, for example, if we were looking a. A simple example is something like procurement where we’re looking at when a purchase order was created and when the purchase order was approved, and then when invoices were drawn down based on that. If you have some of that data coming from different systems or from different tables, you may have, depending on the way in which that is extracted. You may have dates that are in different formats. So you may have month, day, year, and day, month, year interchangeably. And if you don’t sort that out early on, you are then not easily able to use those sorts of comparisons when you’re doing the analysis and creates extra work. So spend time upfront, there getting the right consistency in the formats of your data so it’ll save you effort towards the backend in the long run.

Yusuf: 

That’s right, yes. And you can actually trust the result that’s coming out.

Conor: 

Okay. Third reason why we need to spend time preparing data and ensuring quality.

Yusuf: 

So again, some of it we said in number two, some data preparation upfront can definitely save us time in our analysis. So while data and the use of data is quite iterative, if we know that there are certain fields that we need to use later on, then we want to fix them up before we start. That helps us save time in the analysis, but importantly, fixing up that data upfront helps us save time trying to chase down explanations about why things are or why things appear the way they are later on. So when we conduct some analysis and we get to what we think is a potential result, and then we want to take that back to the business for them to validate it. If we haven’t actually done the legwork upfront preparing that data, we may have instances where we’re actually wasting both our time and the time of the individuals that then need to do that validation, or you haven’t looked at this, or you haven’t looked at that, or you haven’t cleansed this field properly. That then wastes both our time and wastes the time of the individuals that are in the business that need to respond to these requests that are based on invalid data to begin with. One of the main problems with that is because of that wasted time, sometimes you’re not able to then complete that data analysis within the audit timeframe. So you’ve used the wrong data, you’re chasing down an explanation that could have been explained through properly preparing the data. And then you don’t have time to fix that and so you just have to abandon that thought which is not ideal and more and more, we’re getting to audits where we can’t abandon that because that data work is a critical part of the audit that we do.

Conor: 

So spend sufficient time upfront preparing your data to save time overall through the various phases of the audit, and also to save the time of you as auditors and the business as well.

Yusuf: 

Yes, and just not looking silly in front of some business person.

Conor: 

Okay, number four.

Yusuf: 

The second most important, which is why we put it to the end. But this is confidence in the analysis that we’ve done and the results that we are producing. So when we put our report together, if we have explored our data and prepared our data, and checked the quality of the data that we’re using, we then have a much better level of confidence in the analysis that we’ve done. it’s based on data that we know to be of an adequate level of quality to enable the result to have been achieved. When we’re considering exceptions that have been identified or anomalies that have been identified, or potential areas where things need to improve beyond compliance, then that level of confidence that we get is invaluable because we can stand in front of management. We can stand in front of an audit committee and say, we’re confident that the work that we’ve done is based on a sound set of data and well cleansed data.

Conor: 

And you’ve worked hard to establish your reputation. So you want to make sure that where you’re relying on data that you’ve got that confidence that doesn’t diminish your reputation because you’ve spent the time making sure the data’s right.

Yusuf: 

And then equally important is number five, the last one here, which is the ability to provide improvement opportunities related to the cleanliness of the data. So we said before that if we are choosing our objective for our use of data within an audit to align, with the audit objective and we’ve not broken the link back to the overall organizational strategy, then any work that we do would be relevant to that strategy. So it’s important work. We’ve identified good hypotheses based on an objective that is linked to the strategy of the organization, the data that we use to determine whether the hypothesis is true or false, where we’ve identified issues with the data, either the quality of the data in terms of completeness or the quality of the data in terms of accuracy or availability of data, because that data is important for our hypothesis, which is linked to the objective, which is linked to the strategy. We can draw a very direct line between the quality of the data, the issues that we found and the ability to achieve overall organizational strategy. And because we also understand how important the individual fields are that we found and how important they are for that strategy, we then are able to more clearly explain what those improvement opportunities in relation to the data might be. Data’s becoming more and more important and getting good quality data is becoming more and more important. So this is a very direct line between the work that we do preparing the data and our ability to report on opportunities for improvement that can help with the achievement of strategy. So it sounds like a really simple exercise. I’m going to do some data prep and make sure that I’m using the right quality. That’s important for confidence. But what we’re talking about here is that it is important because it helps us to directly affect the ability to achieve strategy.

Conor: 

Regardless if you’re an internal auditor in the private sector or in the public sector, or indeed a performance auditor, every organization will have a genuine interest in making sure that it has access to quality data. So any observations you can make beyond your individual audit to contribute to that overall quality will be well-received.

Yusuf: 

That’s right.

So there you go: 

five reasons for making sure that you take the time and spend the time to improve the quality of your data and prepare your data upfront.

Conor: 

To recap, why you should spend time preparing your data and checking quality. Number five, it’ll give you the ability to provide improvement opportunities, not just as a result of your audit, but for the organization more broadly.

Yusuf: 

Number four, it helps you ensure that you have a level of confidence in the work that you’re doing, the analysis that you’re producing and the results that you’re producing as well. So you can stand in front of your stakeholders with hand on heart and say, I’ve done everything that I needed to do and this data, and these results are correct.

Conor: 

Number three, doing the prep upfront, will save you time both in the analysis as you work through your audit, but also will save time for the business and your interactions.

Yusuf: 

Number two preparation helps ensure that the data has the right consistent format .

Conor: 12:19

Number one, the prep explore the data, and that can be really helpful in getting a good understanding of it before you actually begin the analysis of it.

Yusuf: 12:29

Excellent. So let’s all go and spend time preparing our data and making sure that we have quality in it to get both confidence in our results, but also provide good improvement opportunities for achievement of our strategies.

Conor: 

Great stuff. Thanks Yusuf. Thanks Conor.

Narrator: 

If you enjoyed this podcast, pls share with a friend and rate us in your podcast app. For immediate notification of new episodes, you can subscribe at assuranceshow.com. The link is in the show notes.