In this episode we discuss the use of the word "analysis" in the context of using data for audits.
This is an attempt to define the "analysis phase" - or stop using it as the name of a phase - so that we can better structure our work.
You're listening to the Assurance Show. The podcast for performance auditors and internal auditors that focuses on data and risk. Your hosts are Conor McGarrity and Yusuf Moolla.
Okay, Conor. So this is the first episode for 2022. I need to keep reminding myself, keep thinking it's 2021. And today we want to talk about the word analysis in the phrase data analysis. Why this came up is that - just in my own head, and I know in various conversations we've been having, there's often a difference of opinion or difference in understanding, of what that phase of the data work that we do as part of audits actually is. So thought it would be good to have a conversation about it and define it, either loosely or not, but understand where it comes from what it means. And when we having conversations with individuals that we report to, or individuals that report to us. And we talking about analysis, what does it actually mean? So if somebody says to you go away and do some analysis, or if you're telling somebody go and do some analysis, what does it actually mean.
Sounds like one of these topics that you've been thinking about over the holiday break Yusuf.
Ah, maybe a little bit.
At the back of your head.
Yeah, just a little bit. Through various engagements over the years, it's one of those things that never really got properly defined. And like I said, everybody's got the differences. And so yes, I did think about it a little bit over the break.
So as with most things, we probably should start with definitions or the history of the word or its etymology and where that comes from. So, where do we start with the word analysis?
Etymologically, it comes from the, combination of two root Greek words. The first one is "ana", which is "up" and the second one is "luein". I hope I'm pronouncing it correctly, so any Greek listeners, please just, apologies in advance for not pronouncing that correctly. But ana for up and "luein" which means "loosen". So when you combine those, in reverse order, it's "loosen up" is where it started.
And So when we're thinking about that definition, even in modern times, in terms of data analysis, what does that sort of speak to for you in terms of loosening up with respect to the data?.
So the various definitions and there's no sort of standardized definition for what analysis is or what data analysis is. Obviously various people have put the their thinking hats on and tried to come up with something. But broadly speaking, it's breaking datasets down into their individual components to understand and explore and evaluate. That makes sense in terms of where the word comes from. So the word comes from loosen up and so breaking things up into their individual components to be able to understand those individual components. Because obviously, datasets can be large, either broader or deep. And so you need to first understand what the individual components that make the data up are. And then breaking that up into those individual components helps us to then analyze those individually.
And obviously as Assurance and Audit professionals, quite a lot of the time the drivers for our work, when we're looking at data, are either a risk has been identified or an opportunity has been identified or there has been some sort of problem. And that then becomes the driver for somebody saying, let's go and get some data and do some analysis.
And then within that broad phrase, data analysis or analyzing the data, there's the various steps that we'll undertake. Firstly, understanding the business, then understanding the data, then cleansing the data, preparing it, joining it, matching it, modeling it, analyzing it, exploring it, whatever. So we'll talk about those in a sec. But, yes, data analysis is the broad sort of umbrella term, but then within a data analysis project or data analysis phase, there's the analysis sub phase. Often when we breaking up data analysis work as part of an audit, we'll say we've got the preparation phase, which is the initial phase, the planning phase, then there's the analysis phase, and then there's a reporting phase. That isn't necessarily very distinct, because often you're doing analysis in your preparation and you're doing analysis in your reporting. So when we say analysis phase, what exactly do we mean? And that's really what we want to explore a little bit today.
Where do we start? So we understand that we've either got one of those three things, a problem, a risk or an opportunity. We want to use data to help us to understand, or to work out what's going on. And like you've just explained, loosening up the data initially or decomposing it. Where to from here.
There's several words that we use, and maybe we can explore each of those words to understand what they mean to us. It's difficult to get to an exact definition of what analysis is going to be in an audit. But if we look at the different types of words that we use, we might be able to get fairly close. So I've got a list of about a dozen words here. Explore. Examine. Understand. Profile. Find patterns. Match. Check the system. Run rules. Evaluate hypotheses. Check what happened. Find what does this mean. Ask the so what. Find exceptions, and Anomalies. Not in any particular order. All of those things contribute to our analysis work. So maybe we can explore each of those in a little bit of detail.
Okay. So I didn't get to jot those all down there as you were talking. I just can't write that quickly. I think the first one was explore. So where to from here.
So exploration happens at all phases of data analysis work. Trying to get our business understanding, we will explore. When we're trying to get our understanding of the data, we will explore. When we get the data, you know collect the data and bring it in, we will explore. So exploration is one of those things that happens across the data analysis phase or set of phases. And this is about really understanding what is, and isn't, in the data. How much of data we have, what the scope is. What some initial patterns we can see are. What sort of cleansing we need to do. And then once we've done all of that, exploring what the relationships between different datasets might be. Once we've evaluated rules and tested some hypothesis, it would be exploring the results as well. So explore happens across a range of different aspects of our analysis.
And obviously one of the most important things we need to determine during our exploration is do we have the right data to help us get to the audit objective that we're actually looking at?
That's right. Yeah, that's exactly right. So part of that would be a subset of explore and that is understand. We need to understand whether the data that we have will enable us to answer the question and a subset of that then would be profiling our data. And that means getting a feel for what the range is of the data that we're looking at, range in terms of the highest and lowest values, earliest and latest dates. How many different types of text we have, what categories we have. So that profiling is usually sort of summary type information that we pull together from the data to understand and detail what it is that we have and what we don't have.
Maybe get some early impressions about what might be going on in the data in terms of the population. But also to go back perhaps and determine if there's any, um, any gaps or any sort of things that looked odd, even at the outset that we perhaps need to get a bit more context from the client or the auditee on.
We also look at patterns. So we look to understand, and this is again, all through the phases. So during the initial phase, the pre analysis phase, and we're slowly getting into why this is such a difficult phrase to explain, but were looking for patterns, in our exploration. So in our preparation phase, we're looking for what are the different patterns that we're seeing technically within the data, but then really the patterns we're looking for when we're doing our analysis is trying to get a feel for what the different types of transactions we're looking, looking at transactions where the different types of transactions look like broadly. So trying to identify what the flow might be or identify what the relationships between certain types of transactions would be, that enable us to see those patterns.
And so when you're looking at those patterns, even from an early stage, would you have any expectations going in before you commence your analysis, about what patterns you may see in the data?
That initial business understanding that we obtained would help us to predetermine some of the patterns that we might want to see. And then when we're looking for those patterns initially through our profiling, we then want to determine whether We seeing exactly what we expected to see or not, because that helps us to determine whether we actually have the right data. And then when we actually doing the analysis, we then, again, looking for those sorts of patterns to understand whether the majority of what we expecting is in the majority of the data. We usually don't expect to see more exceptions than rules. And so the original rules around how certain things work is what we want to see in the patterns. And then that understanding that pattern helps us identify where those anomalies are. But if there're more anomalies than real expected transactions, then we may need to just tweak our thinking a little bit.
Or indeed there may have been a significant change in business processes or something we're not aware of that may have led to those, large volumes of anomalies. Okay. What's the next word on your hit list there for analysis?
Matching. So, matching is where we look to join two datasets together. Matching has two connotations. One is a reconciliation, so matching the overall transactions to a summary. So the detail that we have to a summary. But more importantly, there's the joining data up in order to be able to properly understand what's going on. So data normally resides in multiple datasets and we want to bring that data together. So either open data with proprietary data or proprietary data with proprietary data. To be able to more broadly analyze the exact transaction. So the typical sort of use case for that would be where we have master data of some form. So that might be a list of customers or list of vendors or list of employees or a list of system administrators. And then we have the transactions that go below that and maybe a transaction log. And so matching would be bringing that data together. And then we also want to then match that to potentially external datasets. So where we're bringing open data in and trying to match to be able to extend the range of the data that we have beyond the specific initial dataset.
So a word we're seeing more and more now in terms of data analysis in the audit sphere is blending of data. Is blending the same as matching, or is that slightly different?
Matching is a broad concept. How we match data from different datasets together. The technique that we would use would be a blending technique. So joining and blending would be the sort of the underlying techniques that we would use. Now obviously everybody uses this terminology different. So, you may be listening to this and think, oh, no, I don't think about it that way. And that's fine. They all have different meanings depending on how you use it or how you've been taught. But the way that I think about it anyway is matching, is the broader bringing data together and blending is a way in which we do it. One of the ways in which we blend is we join datasets together through some sort of inner join or outer join or whatever.
Okay. So what's the next concept we need to be thinking about then when we're looking at this word analysis.
Okay. So this is where most of our minds go when we think analysis. And that's answering hypotheses or executing on rules. Depending on which way you go in terms of your approach to doing analysis, it would either be I have a hypothesis and I want to prove it or disprove it initially, or I just go directly to rules. Now whether you have a hypothesis or not, you are going to devise some rules. Because the way to answer the hypothesis is to break that down into a rule. The difference between starting with rules and starting with hypotheses is where you want to end up and whether you're going with an objective basis or whether you're going with just a basic rule-based analysis. We typically go hypothesis based first. And then once we've identified exactly what the hypotheses are that we're going to prove or disprove, we then break that down into a set of rules that are specific to those hypotheses. So that's where we then go and say, does A plus B plus C equal to D. Or if-this-then-that. Or are there any situations where this particular master data element doesn't link to this transactional data element or this transactional data element happens after another transactional data element. So there's all sorts of different types of rules that we have - master data rules, transactional rules, blending rules. Not necessarily the largest - so we think about analysis as rules, but that isn't necessarily the largest part of what we're going to do, which is why it's important to understand more broadly what analysis entails.
We've covered there developing your hypothesis, so you can prove or disprove it .And to do that, quite often, then you need to, again, break that further down into rules so that you actually run that testing. What do we need to think about next then in terms of the analysis umbrella.
Yeah. So what then falls out to those roles would be exceptions. So exceptions to the rule, if you like, which we then need to evaluate to understand whether we've disproved that hypothesis for a particular set or we've proved that hypothesis for a particular set. And so exceptions - before exceptions become anomalies, we need to do some system checks, so I'll talk about those together - system checks and other types of checks. So an exception, in the way in which we've been talking about it over the years is an exception that we see technically within the data output. We expecting a certain result from a rule and 90% of the data aligns with that. But 10% of the data results in an exception before we translate the exception to an anomaly. We then need to understand whether that exception is a data problem. Firstly, so we look at that, but then secondly, whether that exception can be explained. One of the ways in which we do that is we talk to the business to understand what those exceptions mean, often we don't want to go in trouble them. So we, in an ideal world, we have access to the system from which the data was extracted. So we can go in and have more detailed check as to what the transaction entailed and why this exception might have occurred. So we take that exception and then we go and have a look at the system. We then talk to the business. And then you know, without getting into too much detail here, we then come up with a set of anomalies that will result from that. And those anomalies then need to be analyzed further between us and the business to understand why exactly did this happen. So the exception checking is why did this happen technically within the data? And then we whittle that down to the anomalies, which are the real exceptions, if you want to call it that, or business exceptions, which we then go and evaluate. Those are the types of things that we would do broadly across an analysis. And then there's three questions that we answer as part of our analysis. That is what happened - and so we do a whole bunch of steps to understand what happened. We then say, what does this mean? And then, so what. So the what happened was the earliest. Looking backwards. What does this mean is often helped by bringing different datasets together. And then the so what, is what do we need to do about this? Or what do we need to, how do we need to report this and what is management need to do about this? All of these terms are part of our examination, which is another term we can use as part of our analysis. And loosening up our data and bringing it back together is what defines our overall analysis effort.
Okay. So we've covered a lot of key concepts and words there that pertain to the word analysis, including, where it came from in terms of its Greek etymology and so forth. And you stepped us through the various things we need to think about under the umbrella term analysis, but no data analysis happens in isolation. We don't just do one audit per year or one bit of data analysis per year. We need to learn lessons from that. How do we take the word analysis forward to our work more broadly as auditors?
I think we need to think that think of data analysis as the overarching I guess the overarching term that we use to evaluate that data as opposed to a particular phase and so one of the things that I think I need to do a lot more of and we need to do a lot more of is not use analysis as a term for a phase in the data analysis itself. Because I've been caught in that trap so many times. Often you go there because you don't really know exactly what you're going to be doing as part of a particular phase. So you know, you're going to be understanding, collecting data, you're going to be reporting and visualizing it, but there's this sort of this gray in between where you're not exactly sure what it's going to be. So you just sort of slap on the terminology and it's kind of like, I've got an analysis phase in my data analysis. What does that really mean? So being a bit more specific about what we're going to do and thinking about more deliberately about what we're going to do upfront can help us avoid that and reduce the ambiguity that goes with what that particular phase in the data analysis.
And obviously we reflect always on the work that we've done in the report that we've just completed. If on that reflection of the analysis work that we've done, we say, oh, this was actually a profiling stage. Or we were looking for patterns here then that can obviously inform how we do our data analysis in the future.
That's right. Yeah. It also makes it so that, that sort of approach is really good because it helps us define in future more closely what we're going to be doing, and that's important. So that there's a shared understanding amongst everybody in the audit team as to what the specific process we undertaking right now is. Because that ambiguity, we found, that ambiguity can create difficulty in that understanding. And it may result in us taking longer than we thought we would have, or shorter than we thought we would have, for certain phases. So yes, looking backwards, we won't always get it right the first time, but every time we look backwards, from our retrospectives or whatever it is else that we do as part of our end of audit work, try to define that a little bit better so that the next time we have more definition.
Yeah, and more definition is good for managing all stakeholder expectations around what exactly is required.
Including your own.
Discussion today about the word analysis. And we loosened round that word and broke it down into what are some of the sub components that sit under it. Some of the key ones we looked at were exploration, understanding the data, looking for patterns within it, profiling what that means and what you get from that, matching versus blending, having an hypothesis based approach and using rules to actually test for those hypotheses. And then obviously all the validation of your exceptions and moving towards true anomalies.
Good stuff. Thanks Conor.
If you enjoyed this podcast, please share with a friend and rate us in your podcast app. For immediate notification of new episodes, you can subscribe at assuranceshow.com The link is in the show notes.