RAG Retrieval Challenge

Interactive version

Run the guided challenge with paste-output checks, hints, solution reveal, and private device progress at labs.k8sllm.online/challenges/rag-retrieval.

Challenge outcome

Finish with a RAG path that can explain why a document was retrieved, which metadata policy was applied, and whether the generated answer used the retrieved context correctly.

Objective

Validate the platform mechanics behind RAG: ingestion, chunking, embedding, vector persistence, metadata filters, retrieval quality, prompt assembly, and answer evaluation.

Scenario

Your platform hosts internal engineering docs. A team reports that answers are sometimes stale and sometimes cite documents from the wrong product area. You need to prove the ingestion and online serving planes are testable before tuning the model.

Prerequisites

Item	Requirement
Cluster	Kubernetes namespace for RAG services and jobs.
Vector database	Any reachable vector store or a mock service with deterministic records.
Test documents	At least three small documents with metadata.
Evaluation cases	Queries with expected source documents and expected refusal behavior.
LLM endpoint	A model endpoint or mock generation endpoint.

Tasks

Define separate namespaces for rag-system and llm-serving.
Run ingestion as an explicit Kubernetes Job.
Define retrieval quality cases before changing chunk size, embedding model, or top-k.
Validate metadata filters for product area, tenant, and classification.
Record answer quality and citation behavior for known queries.

kubectl create namespace rag-system
kubectl create namespace llm-serving
kubectl label namespace rag-system data-class=knowledge

apiVersion: batch/v1
kind: Job
metadata:
  name: docs-ingestion
  namespace: rag-system
spec:
  backoffLimit: 2
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: ingest
          image: ghcr.io/example/rag-ingest:0.1.0
          env:
            - name: SOURCE_PATH
              value: /data/docs
            - name: VECTOR_COLLECTION
              value: platform-guides
          resources:
            requests:
              cpu: "1"
              memory: 2Gi
            limits:
              memory: 4Gi

Validation commands

kubectl -n rag-system get job,pod
kubectl -n rag-system logs job/docs-ingestion
kubectl -n rag-system describe job docs-ingestion

Use a direct retrieval request shape like this, even if your real API differs:

{
  "query": "How do I debug inference latency?",
  "filters": {
    "audience": "platform-engineering",
    "classification": "public",
    "product_area": "llm-serving"
  },
  "top_k": 5
}

Self-check checklist

Ingestion creates a versioned index or records the equivalent version metadata.
Retrieved chunks match the requested product area.
Restricted documents do not appear in public queries.
Answers cite or log source chunk IDs.
A stale index can be detected from ingestion lag or index version.
Known evaluation cases can be rerun after retrieval changes.

Hints

Treat metadata filters as a security boundary, not only a relevance feature.
Tune retrieval before prompt wording. Bad context usually produces bad answers.
Keep a small fixed evaluation set so chunking and embedding changes can be compared.

Expected signals

Signal	Healthy result
Ingestion lag	The platform knows when content is stale.
Retrieval precision	Expected source appears in the top results for known queries.
Context use	The answer reflects retrieved context rather than general model memory.
Policy enforcement	Metadata filters prevent unauthorized retrieval.
Evaluation history	Changes to chunking or embeddings can be compared over time.

Failure drill

Change metadata on a restricted document so it looks public, then run the known query again. The expected learning is whether access policy exists outside fragile document tags.

Objective​

Scenario​

Prerequisites​

Tasks​

Validation commands​

Self-check checklist​

Hints​

Expected signals​

Failure drill​

Related guides​