Skip to main content

2 docs tagged with "vllm"

View all tags

Model Serving Options

Compare vLLM, KServe, Ray Serve, and Triton for Kubernetes LLM serving, and link to deeper vLLM Kubernetes and KServe vs Ray Serve guides.

vLLM On Kubernetes

Production guide for running vLLM on Kubernetes with GPU scheduling, model cache strategy, runtime flags, probes, metrics, and failure modes.