Skip to main content

RAG Tenant Isolation Review

The model answer looks helpful, but one citation belongs to another tenant. The final API route checked the user identity, yet unauthorized text still reached prompt assembly.

This is the security boundary that makes RAG on Kubernetes different from normal retrieval. Authorization must happen before retrieved context enters the prompt.

Scenario

A shared RAG service supports multiple customer workspaces. Ingestion jobs write embeddings into one vector database with tenant metadata. A query route receives user identity from the gateway, but retrieval only filters by workspace after reranking. During a metadata migration, stale chunks from another tenant are retrieved and summarized.

Symptoms

SymptomWhat it suggests
Citation points to the wrong workspaceTenant metadata is missing, stale, or not enforced before generation.
Retrieval recall looks strongQuality metrics may hide authorization failures.
Audit logs show a valid userAuthentication succeeded, but context authorization failed.
The prompt contains unexpected document IDsPrompt assembly accepted context that should have been rejected earlier.

Common wrong instinct

"Filter the final answer."

That is too late. Once unauthorized text enters the prompt, the model can summarize, paraphrase, or cite it. The retrieval layer must enforce tenant and access policy before reranking and prompt assembly.

Production reasoning

Separate the RAG platform into two lifecycles:

LifecycleIsolation requirement
IngestionAttach tenant, source owner, access scope, source version, and index version to every chunk.
Index publishValidate metadata completeness before a new index becomes active.
RetrievalApply tenant and access filters before ranking results are exposed downstream.
Prompt assemblyLog context IDs and policy decisions, not only generated answers.
EvaluationInclude unauthorized-document tests alongside recall and groundedness tests.

Decision checklist

  • Does retrieval receive tenant and authorization context from the gateway?
  • Are tenant filters applied before reranking and prompt assembly?
  • Can every citation be traced to source ID, tenant ID, index version, and chunking config?
  • Do evaluation jobs include negative tests for unauthorized documents?
  • Can the platform roll back an index independently from the model runtime?

Run the RAG retrieval challenge to practice checking tenant filters, citations, retrieval recall, and failure drills.