Skip to main content

LLM Production Readiness Checklist

This free checklist is the launch gate for Kubernetes LLM workloads. It focuses on evidence: what the platform can prove before live traffic, not what the team hopes will work.

Last reviewed: June 8, 2026. Use this checklist as the lead review document before exposing a model, RAG system, or inference route to users.

Production Kubernetes cluster architecture

Readiness checklist

AreaRequired evidence
OwnershipNamespace, route, runtime, model artifact, and on-call owner are documented.
GPU placementWorkload requests GPU resources and lands on a compatible, tainted accelerator pool.
Serving healthReadiness proves model load and endpoint response, not only container process startup.
LatencyTTFT, queue wait, tokens/sec, and p95/p99 latency are visible by route and model.
SecurityRBAC, NetworkPolicy, secret handling, prompt logging controls, and tenant routing are reviewed.
RAG accessRetrieval applies tenant and authorization filters before prompt assembly.
RolloutTraffic shift, readiness gates, model revision, and runtime image are reviewable.
RollbackRollback has been tested for route, runtime image, model artifact, and index where relevant.
CostCost/request includes input tokens, output tokens, GPU profile, utilization, and route class.

Commands and checks

kubectl auth can-i list pods --namespace llm-serving
kubectl -n llm-serving get deploy,svc,endpoints,quota,networkpolicy
kubectl -n llm-serving rollout history deploy/<model-deployment>
curl -sS "$METRICS" | grep -Ei "ttft|queue|tokens|gpu|latency"
CheckPass signal
Launch evidence existsRBAC, quota, NetworkPolicy, deployment, endpoint, and rollout history can be inspected.
Runtime evidence existsModel loaded, endpoint responds, and inference latency is visible.
Security evidence existsTenant, secret, prompt logging, egress, and retrieval boundaries are reviewed.
Business evidence existsCost and rollback impact are understood before traffic shift.

Run the Production readiness challenge to turn this checklist into guided validation.