GPU Node Pool Scheduling for LLM Inference

GPU node pool scheduling for LLM inference is a capacity contract. The platform must make accelerator type, taints, labels, quotas, priority, and workload lane visible before trying to optimize throughput.

Last reviewed: June 8, 2026. Use this page when GPU nodes exist but inference pods are pending, misplaced, or too expensive for the traffic served.

Production Kubernetes cluster architecture

Scenario

An interactive model needs A100-class GPUs. The cluster has GPU nodes online, but pods are pending. A batch job consumed the compatible profile, one node is blocked by a taint mismatch, and autoscaling does not start because the pod selector does not match a scalable node group.

Decision table

Scheduling control	Production use
Node labels	Encode accelerator type, memory, topology, zone, lifecycle, and profile.
Taints	Keep unapproved and general workloads away from expensive GPU nodes.
Tolerations	Require approved model serving workloads to opt into GPU placement.
GPU requests	Use `nvidia.com/gpu` explicitly through the device plugin path.
Quotas	Stop experiments and batch jobs from starving interactive inference.
Warm buffer	Protect user-facing routes from node provisioning and model cold starts.

Commands and checks

kubectl get nodes -L accelerator,nvidia.com/gpu.product
kubectl -n llm-serving describe pod <model-pod>
kubectl -n llm-serving get resourcequota
kubectl -n llm-serving get pod <model-pod> -o wide

Check	Pass signal
Compatible profile	Node labels match the model's GPU memory and accelerator requirement.
Taint contract	GPU nodes reject general workloads and accept approved inference workloads.
Pending reason	Scheduler events identify the exact missing label, taint, quota, or GPU resource.
Workload lane	Batch and interactive workloads do not fight for the same unbounded pool.

Run the GPU node pool scheduling lab to practice unschedulable-pod debugging.

Scenario​

Decision table​

Commands and checks​

Related lab​

Related pages​

Scenario

Decision table

Commands and checks

Related lab

Related pages