Skip to main content

Field Notes

Field Notes are production-style conversations for Kubernetes LLM platform teams. Each note starts from a failure mode that looks simple from a dashboard but becomes a platform decision once traffic, GPU capacity, tenant boundaries, and rollout controls are involved.

Use these notes before design reviews, launch readiness checks, and incident retrospectives.

Field noteProduction questionMatching lab
LLM Latency War RoomWhy are pods healthy while users wait for the first token?vLLM inference challenge
GPU Capacity IncidentWhy are expensive accelerators idle while inference pods are pending?GPU node pool scheduling
RAG Tenant Isolation ReviewWhere must authorization happen before documents enter a prompt?RAG retrieval challenge
KServe vs Ray Serve OwnershipIs the serving layer a platform API or an application-owned graph?KServe vs Ray Serve decision

How to read a field note

Each note uses the same review shape:

  1. Start with the visible symptoms.
  2. Name the common wrong instinct.
  3. Split the system into platform layers.
  4. Pick the signals that prove or disprove the theory.
  5. Decide what should be automated, documented, or moved into a lab check.

Why this section exists

Most Kubernetes LLM failures do not start as tool-selection problems. They start as ownership problems: who owns latency, GPU placement, retrieval quality, runtime flags, rollback boundaries, and cost evidence. Field Notes make those conversations explicit before a platform team scales the model.