FAFO: The Evolution of AI Security Priorities

FAFO: The Evolution of AI Security Priorities#

tl;dr: shift focus from models to systems.

Security is the practice of risk mitigation by understanding and/or controlling the likelihood and impact of an event.

As an industry, we’ve largely been guilty of Foundationing Around. Over the past two years, the world became obsessed with building bigger and better models. What did they ask of security professionals? Evaluate the capabilities of these binary blobs – tell me what it can do. “Can it write malware? Is the malware any good? You’re a security team, what is the impact of this new technology?”

This “impact” question turned out to be a difficult problem, requiring the development of new skills and frameworks. Academia and industry have invested significantly in these efforts over the past two years and we’re still building the ability to quantify and assess those model capabilities and characteristics.

But while security researchers have been working on those model capability evaluations, developers are shipping – that’s what they do. They’re desperate to “find ways to use” AI. LLMs may be akin to the database with its impact on their tech stack. Every developer in every domain is finding a place to shoehorn AI and building skills to future-proof their careers.

… and developers ship. So those binary blobs and expensive API calls are being deployed into production – into existing web services and onto our PCs and phones. The LLM alone was never going to independently generate the user-facing content; it was one component of a larger system. Each other component held the ability to mitigate, exacerbate, or otherwise confound the risk calculus, rendering our blob metrics insufficient. In fact, the processes and tools for measuring those various model characteristics were intellectually interesting, but provide no assurance about the characteristics or capabilities of the final user-facing system.

While we were Foundationing Around, we largely fell behind on preparing security policies, tools, and requirements for the coming generations of systems. Foundationing Around was defined by evaluating potential model impact, but the likelihood of “catastrophe” was always low – it turns out these binary blobs are hard for consumers to use directly (especially the beefy ones that scare us). Now that developers are shipping these into polished user experiences, likelihood skyrockets and we get to Finding Out.

It turns out that engineers are great at engineering. Even if a component has gaps and limitations, engineers augment those by building a system around them. Today, we’re seeing AI deployed in everything. This deluge of products and applications has strained the existing security apparatus’s ability to triage and evaluate their risk. For this, it’s the system context that matters, not the “capability” of any one component. Those model capability evaluations we’ve spent the last two years refining? Not terribly helpful at this step. The rest of the system can either mitigate or magnify the risk of the AI component in ways that we can’t foresee by simply measuring model characteristics. Furthermore, once these systems have users, we should expect the underlying models to be frequently swapped and updated like any other software component. The final system properties are what remain relevant to risk and acceptability.

This applies to non-security domains too. Evaluating a binary blob’s capability may be important for the trajectory of R&D, but is insufficient for determining the risk because it ignores the augmentation of engineering ingenuity that goes into deploying a system. Developers build great products on top of imperfect solutions all the time (see: crypto, time). Understanding the characteristics and shortcomings of those technologies is necessary to help them robustify, but both that understanding and marginal improvements to the technology cannot sufficiently mitigate the risk alone – that’s a system engineering problem.

What does this mean for you as a:

foundation model builder: rock on. We’ve learned that we can’t simultaneously ask your model to be all of the components of a capable and safe system. Your model should benefit from system engineering and evaluation that can help provide guardrails and controls to appropriately shape the system behavior.
system builder: threat model early. You may be breaking longstanding, hard-won security controls. AI is a new software component that has a new and different attack surface.
consumer: caveat emptor. Most “AI Red Teams” focus on content safety, not traditional information security. Many table stakes security assumptions can’t be relied on. If you value it, make security (and the transparency required to assess it) a differentiator when evaluating products.
evaluator: Don’t get distracted by shiny model releases. Without a system view, you cannot reasonably calibrate and mitigate risk. Systems are what we should be building tools and processes for. The underlying model will change. You need to be prepared to model, measure, and control properties and behaviors in the integrated system.

Now that we’re Finding Out, does that mean Foundationing Around will stop? Of course not, we’re going to keep pushing the frontier. This means that the AI system evaluation workload is as small as it’s ever going to be.