KServe vs Ray Serve for LLM Platforms
KServe vs Ray Serve is not only a feature comparison. It is an ownership decision: should the primary serving contract be a Kubernetes-native platform API or a Python-native application serving graph?
Last reviewed: June 8, 2026. Use this page when a team is choosing a serving layer for more than one model endpoint.
Scenario
A model endpoint starts simple. Later it needs retrieval, reranking, model routing, generation, post-processing, custom metrics, and rollback. The platform team wants CRDs, revisions, policy, and standard lifecycle controls. The ML team wants programmable Python graph behavior.
Decision table
| Dimension | KServe tends to fit | Ray Serve tends to fit |
|---|---|---|
| Primary contract | Kubernetes resources and platform policy. | Python deployment graph and application code. |
| Owner | Platform team standardizes endpoint lifecycle. | ML or application team owns graph behavior. |
| Serving graph | Repeatable endpoint patterns. | Custom multi-step pipelines. |
| Rollout | Resource revision, route, runtime, artifact. | Ray Serve deployment, graph code, runtime env, artifact. |
| SRE operability | Operate from Kubernetes resources and controller signals. | Operate Ray cluster plus graph-level telemetry. |
Commands and checks
# Write this inventory before choosing the serving layer.
route=<route-name>
owner=<platform-or-app-team>
graph_complexity=<single-endpoint-or-multi-step>
rollback_unit=<resource-runtime-model-graph>
autoscaling_owner=<gateway-serving-layer-runtime-cluster>
| Check | Pass signal |
|---|---|
| Owner is explicit | The team knows who owns the endpoint contract and production behavior. |
| Rollback unit is explicit | Runtime image, model artifact, prompt, and graph code are not confused. |
| SRE can operate it | On-call can debug routing, queueing, replica state, and runtime health. |
| Alternative rejected | The decision records why the other serving layer was not selected. |
Related lab
Run the KServe vs Ray Serve decision lab to practice choosing by ownership and graph complexity.