This is the second guest interview episode of Algorithm Integrity Matters.
Interview: Shea Brown, BABL AI
This is the third guest interview episode of Algorithm Integrity Matters.
Dr. Shea Brown is Founder and CEO of BABL AI
BABL specializes in auditing and certifying AI systems, consulting on responsible AI practices, and offering online education.
Shea shares his journey from astrophysics to AI auditing, the core services provided by BABL AI including compliance audits, technical testing, and risk assessments, and the importance of governance in AI.
He also addresses the challenges posed by generative AI, the need for continuous upskilling in AI literacy, and the role of organizations like the IAAA and For Humanity in building consensus and standards in AI auditing.
Finally, Shea provides insights on third-party risks, in-house AI developments, and key skills needed for effective AI governance.
00:14 Introduction to Dr. Shea Brown and BABL AI
00:50 The Journey from Astrophysics to AI Auditing
02:36 Core Services and Compliance Audits at BABL
04:11 Educational Initiatives and AI Literacy
06:02 Collaborations and Professional Organizations
09:11 Approach to AI Audits and Readiness
17:44 Challenges with Generative AI in Audits
29:35 Trends in AI Deployment and Risk Assessment
35:08 Skills and Training for AI Governance
40:30 Conclusion and Contact Information
Yusuf:
Today we have a special guest, Dr. Shea Brown, the founder and CEO of BABL AI. BABL AI is a specialized firm that audits and certifies AI systems, consults on responsible AI best practices, and offers online education on related topics. BABL's mission is to ensure that all algorithms are developed, deployed, and governed in ways that prioritize human flourishing. Shea is a founding member of the IAAA, the International Association of Algorithmic Auditors, and he also serves on the board of directors of for Humanity amongst various other initiatives. Shea, welcome to the show.
Shea Brown:
Thank you very much for having me.
Yusuf:
Can you tell us a bit about how BABL came about?
Shea Brown:
Yeah. So I was a professor of astrophysics at the university of Iowa for, I guess, 11 years. which is a kind of weird place to start for AI auditing, but in my research, I, I studied methods for detecting objects in the sky autonomously. So naturally it led, because there's so much data. That if we had every human on earth looking at this data from the sky their entire lives, we still would not get through it. So we had to come up with automated methods. So that led naturally to machine learning and AI. And so it became a pretty integral part of my research. And around 2018, I had decided I wanted to do something a little bit different. I wanted to sort of start a company around AI. And my original intention was to have it as like a, you know, building space AI for NASA or something that was adjacent to my astrophysics research. But around that time was also when A lot of harms that AI could cause were being exposed. So, bias in algorithms, the ProPublica article came out Cambridge Analytica scandal around sort of social media and the algorithmic driving of behavior on social media. And so I ended up sort of focusing more on the ethics and the governance of these systems and auditing felt like a good fit for my kind of scientific brain. You know, how do we interrogate these systems to identify how they could go wrong and mitigate that. and sort of, that was the, beginning. And we've kind of stuck to our roots luckily since 2018.
Yusuf:
And what are the, key types of work that you now do? as BABL?.
Shea Brown:
Yeah. So we. focus, I mean, our, core is our sort of compliance audits for AI systems and compliance or regulatory risk is of course, just one of the many risks that we worry about. But in terms of what companies will pay for at the moment it's for compliance. And so we do a lot of work around hiring systems, HR systems that would like screen resumes and things. there are laws in the United States and elsewhere, anti discrimination laws that require that those systems sort of are not discriminatory and in particular, New York City has a really strict law that requires third party auditing and so a lot of our clients come, come from there. We also do more sort of technical testing of these systems. So, in addition to sort of auditing other people's testing of the system, which is sort of where those compliance audits come from, we also get hired to do direct testing of, AI systems for various properties like robustness or accuracy or bias. We also have done a lot of kind of foundational work Work on risk and impact assessments. So a lot of our early work and a lot of people in our company came from sort of law philosophy. Risk background. And so we publish some work on how to assess these systems for risk. So the risk and impact assessments. And so sometimes we get hired to really focus on what are the risks associated with the system, both for external stakeholders and also internal kind of compliance or operational risk. And finally, we have, sort of, by accident started a, an educational arm of our business, and it really came from, uh, you wanting to make sure that our own auditors had some uniform training and just to demonstrate to, to regulators or external stakeholders that we had some, some standard for our own auditors, but we've opened it up to other people. And so we now have an AI, an algorithmic auditing certification program, which started out as just training and now has become much more formal certification process.
Yusuf:
and you've been embarking on some AI literacy courses as well for the, broader base of employees beyond those that are focusing on auditing. Is that right?
Shea Brown:
Yeah, and that's, that's also was a little bit by accident. I mean, our position as, auditors, we just get exposed to the common risks of these systems. and so we, we just have a really intimate knowledge, I guess, of where these systems could go wrong. And it just translates so well to like, what do you as an HR professional, or you as a, person who's you know, doing marketing or managing customer service or something that uses AI, like, what would you need to think about in terms of mitigating risk, both compliance and, reputational risk? And so people asked us, you know, do you have training in these areas? We see your more advanced training. Do you have something that's a little more sort of boiled down for the general workforce? And so we've just put it together and, the AI literacy requirement in the EU. Article 4 of the EUA Act comes into force very soon in a couple of weeks. And I think , there's just a lot of pressure on organizations to have something on paper to demonstrate that they've trained their workforce. And so again, a little bit by accident, but we're happy we did it.
Yusuf:
I want to switch into some of the work you've been doing with organizations. I mean, we met through the IAAA and I know you're involved in For Humanity. So one of the founding members of the IAAA, so that's the Algorithmic Auditors Association, and then For Humanity the not for profit. Can you tell us about those organizations that you're involved and what that looks like for you. And I guess what got you involved in those in the first place?
Shea Brown:
Yeah. So I think, as an organization, BABL has always been You know, grounded in research. A lot of us are, were academics or are academics still. And so, you know, we approach problems from a very like first principles point of view. And so we're always trying to research but we, we recognize that we can't come up with these things on our own. And, you know, just because we say that this is the way things should be, that's not the way consensus, , works. and so, Early on, I think even in 2018. You know, I was looking for areas that BABL can get involved in to try to build some consensus around this. And for humanity, I mean, Ryan Carrier, the executive director, I met Ryan in an IEEE working group on bias. And he told me about for humanity, which, and the goal of which was to build audit criteria for doing this sort of AI audits or algorithmic audits. And, so that was super compelling. And so I became one of the, one of the first sort of members and fellows trying to grow that organization so that we can build consensus around this., and so I've been involved in now, I'm now a board member and that they've been doing some amazing work and I know you've been involved and working with them and it is a very collaborative environment. And there's a lot of, I mean, it's really like democracy at its best. It feels like we're a lot of people building sort of crowdsourced guidelines, but as It evolved at regulation started coming online where people had to do audits. Now I felt like there was a need for some sort of professionalization of the field, because a lot of people in for humanity are not necessarily auditors. They're just either concerned citizens or their professionals from other areas that have expertise that they can weigh in on. And so some of the companies that sort of were doing early work, Atticus is another one, which is a Spanish company. They're now in the U. S. And BABL and others, other academics that were writing about AI auditing, we sort of got together had a conference at Barcelona and we went to talk to regulators in Brussels, this was a couple of years ago. And that initial group sort of got together afterwards and realized we needed a professional organization. And so that was where the IAAA was. And, and luckily it's been growing and, and, you know, we, work together there as well. And I think that serves a complimentary role to For Humanity, where For Humanity, You know, does do work and training around of auditors, but it's much more about their particular certification scheme. It's not about broadly auditors as a profession and making sure that there's a group that's advocating for them. And that's where the IAAA sort of comes in.
Yusuf:
in terms of, the actual audits that you either conduct or that you help with in terms of readiness. How do you approach those audits or help with readiness? somebody came to you and said, I need to get ready or I need an audit, where would you start and what are the key things you focus on?
Shea Brown:
Yeah. so I'll start from like the foundation sort of, what are we actually doing in these audits? And, it was an evolution for us. Where we, we focused a lot early on on the subject matter. So we did a lot of consulting kind of engagements or assessments where we're going in and we're really doing impact assessments of systems. We're interviewing stakeholders, we're testing the system, gathering data, getting evidence of how these systems behave. But as soon as regulation started coming, actually getting signed, we realized that that's not exactly how. A lot of these audits are going to proceed. It's going to be much more the internal teams will do these things because those are the regulations and that we will be providing compliance on top of that. We're going to be verifying what they did. And so we made a very deliberate decision to, sort of follow assurance standards. And so for most of the work that we do, we follow ISAE 3000. Or the sort of the U. S. and they're very, they're very much there's a significant overlap there. And, and Australia's got their own assurance standards as well.
Yusuf:
Yeah, and we just, we just took the ISAE and made it an ASAE. So
Shea Brown:
Yeah, yeah, exactly.
Yusuf:
the same thing.
Shea Brown:
So, and those standards, I think were very apropos for what we're doing because there was a lot of flexibility. It was more about, I have a piece of information, which I want to increase, the veracity of that information. I want to increase somebody else's confidence in that information. And so we basically use that. And so the process for us, when someone comes to us and says, we want, The very first question is like, what is your environment? What are you worried about? What is important to your stakeholders? What's important to your company? Where do you operate? Right? And so from that, we, we identify a normative standard. There's got to be something that we're checking against. And it could be something internal. I mean, we, Preferably it's not that, but it would be better to have something that's well recognized and external like a law or a standard and we identify that normative statement and, and the question to the company is, okay, have you done these things, you know, and if you've done these things that are in the standard and you want to demonstrate to other people that you've done them, Then we can conduct an audit and that audit really is an assurance engagement and it could be reasonable or limited assurance and it probably your audience is familiar with these sorts of things, but, essentially we just say you need to, you need to demonstrate to us that, uh, You have done these things. Now there's lots of different ways to do that. One is to say, here's the standard and we've done everything in that standard. That's much more like an ISO type audit where these statements are sort of written by some external. So, so you literally just use that normative standard or like the EU AI Act. I comply with the EU AI Act. That is one case where, you would give us documentation to say, That you adhere to that and line by line we have to go check and make determinations that we have reasonable assurance that you have, done all of those things. Now there obviously is a lot of interpretation involved and there's a lot of professional judgment and risk associated with that. Other kinds of engagements where it's much more granular. So someone could say, for instance, and this has happened before, I have an AI system and I have tested it in these ways. And they will produce a document that says, Here's, how we have tested the system. This is the way it behaves. These are the numbers that we got. These are the definitions, maybe they reference some other definitions that are external, maybe they have a separate document that has definitions about what those metrics mean. then we come in and we do an assurance engagement over those statements. And so that's more complicated, it can be more complicated, because We have to control what's being said and not everything can be verified too. We can't provide reasonable assurance against anything. And so there we've had to sort of develop processes internally for like, what do we do? How do we check the auditability of , those statements? And we might come back and say, well, we can't provide assurance. about these statements. You can put it in the document, but we're going to have to be very specific about what, maybe we're just verifying the numbers, maybe we're verifying that you've done a risk assessment, or that you have a governance committee, those kind of things, but certain things we can't verify. Anyway, that's probably more detail than you're looking for, but that's roughly kind of the way most of our engagements go.
Yusuf:
And what are the key, things that you're looking for? So, would it be, broad governance? Would it be, ensuring risk assessments have been done? Would it be ensuring impact assessments have been done? What are the sort of the key building blocks of what that audit process might include?
Shea Brown:
So it depends a little bit on the normative standard. Now we have one that we kind of had to construct ourselves for New York City, which is the one that has a hard law that says you have to, have done a bias audit of your system, your hiring system. So in that case, we didn't feel like auditing somebody's testing was enough because we knew that risk and governance were important. Okay. And so in this case, we kind of identified three pillars that we thought, like, you had to have these three pillars to substantively mitigate the risk of that AI system. And it was governance, which is really about, like, just there are people who are ultimately responsible and accountable for managing the risks of that system. that's an easy one to satisfy, sort of superficially. It's a hard one to do sort of substantively, like I really have somebody that's responsible. But that's, that's one pillar. It could be a committee, could be a person, it could be a group of people that have processes associated with that. The other pillar is risk. And for us, like, we really focus on risk. Identification, like risk assessment, not so much the mitigation bit, just because we didn't want to impose too many things that were sort of supererogatory to the law. But we thought that if I could only do one thing, with this system to mitigate the risk is to simply know where the risks are. And so we, that pillar was really focused on have you thought through the stakeholders, thought through the risks associated with that system, identified them, and prioritized them. And so we have kind of criteria around simply doing that. And whether you do something about it or not, that's not part of the standard or that framework. But we do think that just having that visibility within the organization is important. And then the third pillar is the technical testing. Because these are technical systems, they have their algorithms, and you need to have metrics associated with that. You need to monitor those systems somehow, the behavior of those systems. And so metrics have to be formed. Bias is a natural one. So New York City, it's bias, which leads, you know, it's disparate impact, but it's really just bias. But it could be other ones. And we've done other engagements where those metrics were different. Maybe it's something to do with the robustness or the reliability of the system or the accuracy. But some sort of metric. And some way of connecting those metrics to things that matter, which happened in the risk assessment. And so that sort of triangle, we feel, based on kind of our research and experience so far, is sort of like the most germane to risk. and I would say if, if a company were to focus on what do I want to do to like mitigate the risk? Those three pillars kind of have to be there and that's what we can, we can focus on those and that's what we tend to focus on in the audit is to really look at, have you assessed risk in a way that is coherent and prioritize them in a way that is understandable and you've justified, have you done some technical testing and are those numbers accurate, we have to check those numbers of course, and the processes, and then do you have some governance around it.
Yusuf:
So, talking about, broad risk assessments and identifying those risk scenarios, et cetera, and being able to determine where that sits in terms of, nature of, algorithms or AI systems that, You're involved in auditing. I imagine there'll be lots of traditional models over the years, but what are you seeing in terms of the split between, the demand for audit work or consulting work that relates to generative AI, which is probably more stochastic and versus traditional models, which were, a bit more deterministic, broadly speaking.
Shea Brown:
So in terms of trends, we've seen a huge shift over the past two years where companies, that we audited several years ago that were using traditional sort of discriminative or, or, you know, regression type models to predict things are switching to generative AI as a base. And then we're getting a much more, almost everything that we deal with now has generative AI components at least to it. And sometimes it's to extract data that then get fed into a discriminative model. Sometimes it's, it's fully end to end. Like agentic, where it's everything's run by large language models. And so we're seeing a lot of it now in terms of how that affects the audit process and the way we conduct ourselves, it does change a lot. So things like materiality. We understand that they are, stochastic. And so for instance, when we're checking some numbers, so a company has done a bunch of evaluations, right? So we recommend that they sort of put those numbers together. Evaluations in the buckets that, correspond to risk so that it's just easier for us to understand and audit what are the, the evals for, but they'll run evals on their system and they might monitor that and, and, and, but they'll get numbers and those numbers are things that we assure and so when we test, we have to do much more direct testing. And so I, I hesitate to call it red teaming because I don't think that it's not red teaming per se, but we do have to probe the models to make sure that, on average, the results of the evaluations are the numbers that they're giving. And it introduces a lot of uncertainty. And, and in some cases we have to be very careful about how the audit report is, is written too, because you know, the testing will be done and the numbers will be given like 95%. Pass rate on this particular eval set, right? But, given the assurance standard that we're following, we have to think about what is the intended user going to think about that number. And if the, the way that the statements are made by the company are such that it could mislead them into thinking that 95 percent of the time when I use it, it will be, I won't see this type of behavior. Let's say it's toxic language or something that, well, that's not actually what that number means. What that number means is there's a. They have a curated evaluation test set. It has this many rows in it and 95 percent of those passed. But that doesn't mean that in the wild, that's the way it's going to behave because there's a mismatch between those two domains. And so we often have to come back to the clients and say, you need to soften this language. You need to be more explicit about what this actually means, because the intended user could misunderstand and the differences between what. Those numbers are, between the testing and the real world, could be material for the risk. and so it changes the way that reports have to be written. It changes the way we have to substantively test those systems.
Yusuf:
so you've got the, difference between the evaluation data set that is used, you know, those prompts or whatever else it is that are being used to test. then you've got the fact that each time you try, it might give you a different answer. So deterministic would be, you know, it's always, or pretty much close to always going to be the same. If you've got the same model and the same data, you're going to get the same result, largely. Now we're getting different results. Like, how do you deal with. that variability where even if you did have, you know, the exact same data set for evaluation and you try to replicate the results that were produced, you might end up with something different,
Shea Brown:
The general philosophy is like, we do have to come up with some materiality threshold for ourselves, you know? And, so. Let's say we, we feel like, you know, 5%, let's just say 5%. We just pick a, pick a threshold, right? There has to be some justification for it. And, and it's probably going to be risk based. And so we have to justify for ourselves, like, why are we picking this? But let's say we pick it. there's several ways in which that 5 percent could show up. So let's say here's this number that gets reported. I might sample. a bunch of evals from that whole data set, right? A random sampling of like 100. And I run it through and I get more than 10 percent difference in that. Well, that's, is that really the materiality that I worry about? Is that violating my threshold? Probably not. And, and so we have to come up with some way or maybe it does, right? But we have to justify what does it mean? When is that 5 percent relevant or the 10%, whatever we've chosen. And so the way I think we typically approach it is to say, well, let's run this eval set, 50 times and we'll take the average of those 50. After 50 times, if it's more than 10 percent away from what they reported, let's say 5 is better when you're doing these averages you want smaller. It's, it had better be within 5 percent of what they reported or else we think there's That's a material difference between, but we have to justify that and it will change and it depends on whether they've set the temperature. So in some cases, for those who know large language models, you can set what's called the temperature. If I set the temperature down to zero, I can make it deterministic. Deterministic, meaning if I put up the same prompt in twice, I get the same result out, but, or fairly close, right? Because there might be some white spaces or some uncertainty in the system. And, and in some cases, like depending on whether you use Azure or your, your, what API you're using, you get some weird results. But even when the temperature turned down, then there's a question of, Here's this prompt. What if I were to reword that in a way that is, uh, substantively the same, like it has the same meaning, but it's slightly worded differently. And I put it in, do I get the same, results then? Does it pass or fail according to their threshold? And so that's another sense of the materiality that we have to worry about. Do we just take it at face value or do we want to look at. Substantively, if I perturb that, will it get me the same results? All of these are complications that make this kind of work totally fascinating, but also really difficult. And so we have to articulate very clearly when we're doing it. What are we setting for materiality? And how do we manage that uncertainty? and then it has to be expressed in our audit report very clearly exactly what we did. And so the, you know, the, the way the procedures are described have to be a little more verbose than like, if I were just counting, number of cars in a parking lot or some other kind of assurance engagement.
Yusuf:
Okay, that makes sense. And would then extend to. an expectation or maybe, you know, good guidance for the organization that's actually doing the implementation would be when they're doing the evaluation to, to do it multiple times and have the various different scenarios and settings and parameters. And so, you know, it's not just, Let's throw some testing at it and see what comes out, but let's do it a few times and see whether they are significant differences and try to average it out so we can sort of, you know, have a better feel going into the audit as to what might come out.
Shea Brown:
Yeah, I think it does change. And one thing that we're seeing is that there's a gap in the knowledge. of the practitioners who are working within the organization, there's also a gap in the auditors as well, which that's part of what our training is sort of trying to fill that gap, but inside the companies, what, what we found. A lot of times, not to zero any particular, cause there are people who have wonderful, wonderful developers and one, and people who are very, very smart. But that kind of like a scientific approach of, I want to categorize the uncertainty of my results, which is a very fundamental to science. You know, my PhDs in astrophysics, you know, we get like sometimes five photons from a star, right? And, But we get like a thousand photons or a hundred thousand photons just from the atmosphere. And so we have to interrogate everything to really figure out whether information is actually what we think it is. And so it's in our nature, and in most nature, to interrogate information. Now, data science has not, in general, had that kind of culture. especially because of the rush to get talent, a lot of people are going through boot camps and just like going from a different field, cyber security or IT into data science, and they haven't really cultivated that kind of thinking, and so that's a real problem for us because they don't know, they think if I run a test, So in the evaluation set, I'm going to get a number and that's good and they might notice that there are differences But they don't they don't necessarily know what do they how do they approach? identifying the central limit theorem is not necessarily uh in their core dna and so that's been a struggle and I think that's something that we I've really wanted to like figure out how do we tackle that in terms of getting education out there about testing these systems in a way that's going to be robust and you can demonstrate it to an auditor that you've characterized the uncertainty well enough.
Yusuf:
Okay, so you've trained the auditors, you've trained the general employee base, and so now, you know, turning attention to developers and data scientists and those so that, I guess, not to make your life easier, but to make, to reduce the risk overall, really.
Shea Brown:
Yeah, I think it's necessary. The overall level of literacy needs to increase, I think, like across the board. And, you know, even within, like, we know that we have to keep up, right, because Things are changing so fast. So we've audited systems , and companies that have systems that have some agentic features, right. But more and more, they're just like really highly autonomous agents that need to get audited, need to get certified in some way. And that is a, that's a research question for us. You know, like, what do we, could we apply the same framework we would for like general LLMs to a system that now has access to tools that can execute. Things autonomously. And I think the answer is we can't. We can use some of those lessons, but even those lessons are kinda like still cutting edge or, in development. And now we have these agents and so it's like there has to be upskilling continuously. There has to be like a, a continuous push to like raise the literacy level for all of the practitioners in risk compliance, auditing , all of these fields.
Yusuf:
So talking about the fast pace, and a lot of what we're seeing nowadays is really coming out at a faster pace than we can, forget consume, but even, you know, understand and think about and be across. What are you seeing in terms of, deployment of AI systems? Is it, is it more in house build or is it more buying or is there a bit of a combination of that going on?
Shea Brown:
yeah, that's a good question. I think the majority, of what's happening right now, I feel like is in house configurations of third party, generative AI kind of environments. So what I mean by that is like, I've chosen Copilot as my platform of choice, which of course, you know, references OpenAI models. And within that platform, I have tools to develop agents or develop chatbots or to develop different kind of applications for my own internal use on my own internal data. And so there's a lot of that happening. Or you've, you're a Google company and you've got Gemini and, and you're trying to, to organize and develop internal processes using that platform or. DataIQ, IQ, or, Databricks or whatever, there's, there's just many of these kind of platforms. We're seeing a lot of that. Then, of course, that's at the deployer level, so the deployers are now becoming providers, like to use the EU AI Act terminology. A lot of the, what we traditionally think of deployers is they're developing these tools now. So, based on some models, but we also have a huge proliferation of these new vendors that have these sort of vertical AI. So for AI for HR or AI for education, AI for you know, financial services and like risk assessment or, or, you know, anti money laundering or whatever it is. And so we're seeing a lot of those, and there's a huge explosion of these companies that are like getting into that. And they're getting some success selling into enterprise, or at least selling into some, some bigger companies. And so that's a not insignificant part of the market as well.
Yusuf:
Right. what does that mean for the way that organizations that are deploying evaluate or assess risk? Because risk will be quite different if you, you know, developing a traditional model in house and deploying it versus buying a whole bunch of different systems or components and putting them together and then relying on third parties to have done some of the other testing that needs to be done and, bias evaluations and things. So like, how does that change? Or have you seen quite a significant change in that, in the thinking around assessing those risks?
Shea Brown:
Yeah, so the third party risk has been around for a while, and in the sense that because of the cybersecurity or information security and, and sort of privacy, the areas, people have had conscious of the fact that I'm exposed when I bring on a third party vendor , to process my data in some way. I think that the shift has been the, or the struggle rather has been what are the new processes in that procurement that we need to put to get some mitigation of these new AI risks and, and people, you know, you can find checklists here and there. And some of those are implemented. Okay. Others are just really poorly implemented in the sense that you give them a vendor questionnaire and the answers that come back are very unsatisfactory. And then you don't know what to do, right? Because there's no, what, what is the recourse for that? And so I think what we've seen is that they fall back on sort of the core information security principles. So the focus is primarily on, are you, are these eight new APIs that you're integrating going to expose my data externally? all of the regular encryption stuff. Are you going to use my data to train your system? those are questions people can wrap their brain around. But some of the other risks around bias or like validity, frankly, like, is this tool a valid tool for the thing you say it's doing? There's some missing pieces there that I think need to get filled in and you need an expert, you know, that's like, those are things that, that people like us, we can ask the right questions and we can understand what the answers are in a way that would filter through. Sort of like I just opened up a spreadsheet where I have a Google doc that has the policy in place, but I haven't done anything substantive versus I've done some substantive risk mitigation for you. Now for the other area where there's a lot of uncertainty is that in house development, because a lot of deployers, don't have the experience of what do I do when I'm developing this thing internally. Now they have an IT function typically, right, that is going to handle information security things, but now all of a sudden you. You've procured something like Copilot, or you have a subscription to a private GPT or something. And now I have people who are not data scientists developing some applications to do stuff in my company. What do I ask of them to make sure I'm mitigating those risks? That's a new uncharted territory for a lot of these companies. And that's where, they also need a little bit of help to identify like, what are the core things? And in fact, those three pillars that I mentioned are a great place to start. Like, is there somebody who's just looking out for the risks there? Have we categorized what risks there could be? Just get a meeting together and do it and have like a little risk register for it. And then have. We've done any monitoring at all or testing, even minimally, just pick a metric and go with it. but that's still, it's, it's early stages for most companies there.
Yusuf:
so in terms of that, and particularly within regulated industries like financial services, what sort of skills would leaders within financial services expect of their teams? And, you know, what, what is it that they need to. develop in terms of skills within those teams to stay ahead of AI governance expectations. And I guess maybe just, you know, catch up and then stay ahead.
Shea Brown:
yeah, that's a tough thing. I mean, we're still constantly trying to figure that out. at the core of it is like an understanding of how the AI systems work. Now, because, it's so easy now to develop things with large language models, It's not hard to like get away with not understanding how they work because it just does everything for you. And I think getting some training on how do these systems actually work, what does it mean, like where do they get, where does the large model get access to your information, right? Your querying documents. What does that process actually look like? Having some basic understanding of it because that's going to inform the next thing, which is how do I, assess the risks of that process? So there's a risk assessment piece, which BABL as a company, we spent so much time doing those and trying to develop methodologies for those risk assessments that we, feel like there is a practice, right? it's a professional practice of assessing the risk of algorithmic systems. It's not just the standard enterprise risk management that people do. It's not standard, uh, risk assessments you would do under information security. They're related, but they're substantively different. And so, understanding what that process looks like is important. Like, how you would identify stakeholders and really think through and identify the unique risks come from those systems. So, it doesn't have to be super deep, and we teach it, and we have a class that teaches it, and, and I think, So we've seen that people can get those skills to a pretty high degree in the course of a few weeks and some dedicated practice. And then the technical bit, and so some of the technical metrics, just understanding what metrics are out there, right?, and some basic understanding of how you would interpret those. And so this goes for product owners, this goes for like the subject matter. Somebody in that group needs to understand that we can measure things like how relevant the retrieved information is, for instance, for LLMs, or how complete that information is. There's methodologies for extracting that and I understand what those numbers mean. Or there's some, out of the box evaluations we might be able to use. I think that's the core. And if you have those, most people can bring other skills. If I'm a risk professional, I understand what it means to sort of try to categorize that risk and maybe attach it to a dollar value or some, or something like that in my company, or put it in my risk register. But I need to understand how to think through that first. And then I might understand machine learning, but I don't necessarily understand how I'm going to extract a risk relevant metric from my system. That I can monitor over time and it will inform decisions about how I govern that system. and that's the little ecosystem that is a tough nut to crack. But I think it's one of the most germane and important ones to crack.
Yusuf:
And it starts with understanding the landscape, understanding, what the different components are and how they fit together.
Shea Brown:
yes, I think the understanding of the AI, what AI can and can't do. I see a lot of misconceptions about what AI is, what generative AI is and what it's doing. And from simple things of like assuming that it just remembers everything, it doesn't necessarily remember everything that you've talked to it, right? There's a context windows in these. And so unless the system's been set up in such a way that it can retrieve information that it's gotten from you in the past, then it doesn't necessarily remember what you, what you asked it. Five months ago
Yusuf:
the one I use doesn't remember what I asked it five minutes ago.
Shea Brown:
Yeah. Well, exactly. And that's, that's a big part of like understanding exactly how it's weighing the things that come into it. Like you don't need a comprehensive knowledge of what a transformer is, which is kind of the basis of a lot of these, but you do have to have some understanding of like, how is it. Weighing the information that's coming in and the recency bias and that kind of thing. If I start going track and then I try to get back to what I was doing, it's probably going to get confused and it might start doing things I don't like. But I think it's a, it's an exciting time though. And I think. It's an exciting time for people who are in companies or in this field who have to interact with AI a lot and want to like build a little bit of job security because, you know, governing these systems is something that, you know, eventually will, you know, more pieces of the governance will get replaced by automation as well by AI. But at least in the near, in the near term, one of the kind of few human jobs that is like going to be impenetrable is like ultimately having that oversight and accountability to connect what these systems are doing with the real world. And if you're good at that, then that's, and it's hard to find people that are good at that, who are really good at that. And if you can get good at that, that's something that people will recognize and it will be valuable for your career.
Yusuf:
Excellent. Shea, we coming up to the, end of our time together, I do appreciate the time that we've spent. Just final question. Where can listeners find out more about what Babbel does and you know, potentially connect with you?
Shea Brown:
I'm pretty, active on LinkedIn, so Shea Brown on LinkedIn. And we have a YouTube channel, BABL AI. on YouTube is where we post a lot of podcasts and, interviews. and babbel. ai. com is the company page where we have our services and things like that. And if you go to courses. babbel. ai, that's where we have some of the training programs some of which are, are free, that talk about some of the things that, that we talked about today.
Yusuf:
Fantastic. And we'll put links to all of those in the show notes Shea, thank you so much for taking time to talk to us today.
Shea Brown:
Thank you so much. I appreciate it.