Field Notes
Field Notes are production-style conversations for Kubernetes LLM platform teams. Each note starts from a failure mode that looks simple from a dashboard but becomes a platform decision once traffic, GPU capacity, tenant boundaries, and rollout controls are involved.
Use these notes before design reviews, launch readiness checks, and incident retrospectives.
| Field note | Production question | Matching lab |
|---|---|---|
| LLM Latency War Room | Why are pods healthy while users wait for the first token? | vLLM inference challenge |
| GPU Capacity Incident | Why are expensive accelerators idle while inference pods are pending? | GPU node pool scheduling |
| RAG Tenant Isolation Review | Where must authorization happen before documents enter a prompt? | RAG retrieval challenge |
| KServe vs Ray Serve Ownership | Is the serving layer a platform API or an application-owned graph? | KServe vs Ray Serve decision |
How to read a field note
Each note uses the same review shape:
- Start with the visible symptoms.
- Name the common wrong instinct.
- Split the system into platform layers.
- Pick the signals that prove or disprove the theory.
- Decide what should be automated, documented, or moved into a lab check.
Why this section exists
Most Kubernetes LLM failures do not start as tool-selection problems. They start as ownership problems: who owns latency, GPU placement, retrieval quality, runtime flags, rollback boundaries, and cost evidence. Field Notes make those conversations explicit before a platform team scales the model.