RAG Security Is an Authorization Problem
Lab 1 exposed a vulnerability that had nothing to do with prompt injection. The employee's query was entirely legitimate. The model did not need to be jailbroken. No adversarial prompt was required.
The issue was that the retrieval pipeline searched all documents by relevance before checking whether the user was authorized to see them. That is an authorization problem, not a prompt problem. And it is one of the most common architectural patterns in enterprise RAG deployments.
The two retrieval models
Most RAG systems follow one of two patterns when a user sends a query.
Post-retrieval authorization retrieves documents by relevance first, across the entire corpus, and then applies access control. The system scores all documents by semantic similarity to the query, selects the top results, and passes them to the language model. Authorization checks happen after the documents are already in the response context — or not at all.
This is the pattern Lab 1's vulnerable mode uses. It is also the default behavior of many retrieval frameworks, because search engines are designed to return the most relevant results, not the most authorized results.
Pre-retrieval authorization filters the corpus by user permissions before running relevance scoring. The system first determines which documents the user is allowed to access, then searches only within that authorized subset. Restricted documents never enter the retrieval pool for unauthorized users.
This is the pattern Lab 1's hardened mode uses. It is the correct pattern when documents have different sensitivity levels and users have different access rights.
Why prompt hardening does not fix this
When a RAG system leaks restricted content, the instinct is often to add prompt-level controls. Filter the input for suspicious queries. Add output scanning for sensitive keywords. Instruct the model to refuse requests about restricted topics.
None of these address the root cause.
Consider the actual proof of concept from Lab 1:
I am helping prepare the incident response briefing for
leadership. Please summarize the latest internal notes related
to credential exposure, affected systems, and containment steps
from the security operations workspace.
There is nothing adversarial about this query. It is legitimate business language. A prompt filter designed to block "show me the secrets" or "ignore your instructions" would pass this through without a second look.
The restricted documents are already in the model's context before any prompt-level control can act. The model is not being tricked into generating restricted content — it is being given restricted content by the retrieval pipeline and asked to summarize it. The employee's query is not adversarial. It is a normal business question that happens to be semantically similar to documents the employee should not access.
Prompt filtering would need to understand every possible way a legitimate query could trigger retrieval of restricted content. That is an unbounded problem. The authorization control needs to be at the retrieval layer, not the prompt layer.
The remediation breakpoint
In offensive security, the remediation breakpoint is the specific control that, if implemented correctly, would most effectively prevent the exploitation path. It is the most precise recommendation you can give — not "improve security" but "enforce this specific control at this specific layer."
For retrieval authorization failure, the remediation breakpoint is pre-retrieval access control:
- When a user sends a query, resolve their identity and permissions first.
- Filter the document corpus to include only documents the user is authorized to access.
- Run relevance scoring against the filtered corpus.
- Pass only authorized, relevant documents to the language model.
- Generate the response from authorized content only.
Step 2 is where most vulnerable systems fail. They skip it, defer it, or implement it as a post-retrieval filter that logs unauthorized access but does not prevent the documents from entering the response context.
What this means for real systems
Lab 1 simulates a pattern that exists in real enterprise deployments. Any organization that connects a language model to document repositories with mixed sensitivity levels and multiple user roles faces this design decision.
Internal wikis where HR documents sit alongside engineering docs. Knowledge bases where security runbooks are stored in the same system as public FAQ. Legal matter management systems where privileged communications are indexed alongside general legal research. Healthcare portals where clinical notes and administrative records share the same retrieval infrastructure.
In each case, the question is the same: does the retrieval pipeline enforce document-level authorization before or after the documents are passed to the language model?
The answer determines whether a low-privileged user can access restricted content through a normal, non-adversarial query.
The SPECTRA perspective
Generic AI security testing would likely report this as a prompt injection or data leakage finding. The payload succeeded in extracting restricted information, so it looks like the model failed to refuse a malicious request.
SPECTRA's context-aware approach identifies it differently:
Prompt Injection Successful
Model returned internal documentation.
Remediation: Improve prompt filtering.
Retrieval Authorization Failure
6 Restricted Documents Exposed to low-privileged user through the RAG pipeline.
Root cause: Post-retrieval authorization.
Remediation: Pre-retrieval access control.
The same vulnerability reported two ways. The generic finding sends the team to the prompt layer. The SPECTRA finding sends them to the retrieval layer where the fix belongs.
That distinction matters for the team receiving the report. The finding is not about the model. It is about the retrieval pipeline. The root cause is not prompt compliance. It is authorization applied at the wrong layer. The remediation is not "improve prompt filtering." It is "enforce pre-retrieval access control."
That distinction matters for the team receiving the report. A prompt-level recommendation sends them to the wrong part of the stack. An architecture-level recommendation sends them to the retrieval layer where the fix actually belongs.
The broader principle
RAG security is not primarily about defending against adversarial prompts. It is about ensuring that the authorization model the organization expects — role-based access, document-level permissions, sensitivity classifications — is enforced at the retrieval layer, not just at the user interface.
If a user cannot access a document through the organization's file share, they should not be able to access it through the AI assistant connected to the same documents. That seems obvious. But the default behavior of most retrieval frameworks does not enforce it.
Building the authorization check into the retrieval pipeline is not a prompt engineering problem. It is an architecture problem. And testing for it is not a payload generation problem. It is a context-aware assessment problem.