LLM Production Readiness Checklist
This free checklist is the launch gate for Kubernetes LLM workloads. It focuses on evidence: what the platform can prove before live traffic, not what the team hopes will work.
Last reviewed: June 8, 2026. Use this checklist as the lead review document before exposing a model, RAG system, or inference route to users.
Readiness checklist
| Area | Required evidence |
|---|---|
| Ownership | Namespace, route, runtime, model artifact, and on-call owner are documented. |
| GPU placement | Workload requests GPU resources and lands on a compatible, tainted accelerator pool. |
| Serving health | Readiness proves model load and endpoint response, not only container process startup. |
| Latency | TTFT, queue wait, tokens/sec, and p95/p99 latency are visible by route and model. |
| Security | RBAC, NetworkPolicy, secret handling, prompt logging controls, and tenant routing are reviewed. |
| RAG access | Retrieval applies tenant and authorization filters before prompt assembly. |
| Rollout | Traffic shift, readiness gates, model revision, and runtime image are reviewable. |
| Rollback | Rollback has been tested for route, runtime image, model artifact, and index where relevant. |
| Cost | Cost/request includes input tokens, output tokens, GPU profile, utilization, and route class. |
Commands and checks
kubectl auth can-i list pods --namespace llm-serving
kubectl -n llm-serving get deploy,svc,endpoints,quota,networkpolicy
kubectl -n llm-serving rollout history deploy/<model-deployment>
curl -sS "$METRICS" | grep -Ei "ttft|queue|tokens|gpu|latency"
| Check | Pass signal |
|---|---|
| Launch evidence exists | RBAC, quota, NetworkPolicy, deployment, endpoint, and rollout history can be inspected. |
| Runtime evidence exists | Model loaded, endpoint responds, and inference latency is visible. |
| Security evidence exists | Tenant, secret, prompt logging, egress, and retrieval boundaries are reviewed. |
| Business evidence exists | Cost and rollback impact are understood before traffic shift. |
Related lab
Run the Production readiness challenge to turn this checklist into guided validation.