Why AI Security Testing Has a Context Problem

Most AI security testing starts with the same question: can we make the model fail?

Run a list of prompt injections. Check if the model leaks its system prompt. Try some jailbreaks. Record which ones worked. Ship the spreadsheet. Move on.

That approach finds real things. But it misses the question that actually matters for deployed enterprise systems: what does that failure enable in this specific environment?

The payload is not the risk

A prompt injection against a public chatbot with no backend access is a curiosity. The same prompt injection against an internal assistant connected to HR documents, security runbooks, executive strategy notes, and a retrieval pipeline with weak authorization is a business-critical exposure.

The technique is the same. The payload is the same. The risk is completely different.

That difference comes from context — the system architecture, the data it can reach, the authorization model it uses, the tools it can invoke, and the business workflows it sits inside. Without understanding that context, a security assessment can only tell you what the model did. It cannot tell you what that behavior means.

Model testing vs. system testing

There is an important distinction between testing a model and testing a system.

Model testing asks whether the model can be manipulated. Can it be jailbroken? Will it comply with a malicious instruction? Does it leak its system prompt? These are valid questions. They produce technical observations about model behavior.

System testing asks whether manipulation creates real-world risk. If the model complies with a malicious instruction, what data does it return? If retrieval pulls in documents the user should not see, what authorization control failed? If the model summarizes restricted content, is that because the prompt was clever or because the architecture does not enforce document-level access?

The first produces a list of behaviors. The second produces a finding with root cause, business impact, and remediation.

What a finding looks like without context

Here is what a generic AI security assessment might report:

Finding: Prompt Injection Successful

The model complied with a crafted prompt that attempted to extract internal information. The model returned content that appeared to include internal documentation.

Remediation: Improve prompt filtering. Add output sanitization.

A generic AI security finding. Technically accurate, but no root cause, no architecture context, and remediation that may not address the actual issue.

That finding is technically accurate. But it does not explain what happened at the architecture level. It does not say whether the issue is in the model, the retrieval pipeline, the authorization layer, or the prompt design. And the remediation — "improve prompt filtering" — may not address the actual root cause at all.

What a finding looks like with context

Here is what the same issue looks like when the assessment understands the system:

Finding: Retrieval Authorization Failure — Restricted Documents Exposed

A low-privileged user triggered retrieval of restricted documents through the RAG pipeline using normal business language. The root cause is authorization applied after document retrieval (post-retrieval filtering) rather than before retrieval (pre-retrieval access control).

No prompt injection or jailbreak was required. The vulnerability is architectural.

Remediation: Implement pre-retrieval access control. Filter the document corpus by user permissions before relevance scoring. Enforce document-level authorization at the retrieval layer.

The same observation as a context-aware finding. Root cause identified, remediation mapped to the correct architectural layer.

Same system. Same observation. Completely different finding. The second version identifies the failed control, explains why prompt-level fixes would not work, and maps remediation to the actual architectural breakpoint.

Why context requires methodology

Context does not come from running more payloads. It comes from understanding the target before testing it.

What kind of system is this? What data sources does it connect to? What authorization model does it use? What roles and permissions exist? What is the sensitivity hierarchy of the content it can access? What tools can it invoke? What business workflows depend on it?

Those questions need structured answers before the first payload is sent. The assessment needs a methodology that profiles the target, maps the architecture, and then generates tests that are relevant to the system's actual risk surface — not just the model's behavioral surface.

That is the problem SPECTRA was designed to solve.

What comes next

In the next post, I will walk through the first lab I built to validate this approach — a synthetic internal enterprise knowledge base assistant with a seeded retrieval authorization vulnerability. The lab has a vulnerable mode that exposes the flaw and a hardened mode that fixes it.

The question I set out to answer: can a context-aware testing methodology identify the architectural control failure, explain the root cause, map the correct remediation, and recognize when the fix works?

The answer turned out to be more interesting than a simple yes or no. It took several iterations to get there, and the process revealed something important about how AI security evaluators work — and fail.

Continue to Part 2: Building the Lab →