Control Plane And API
The control plane determines whether Kubernetes behaves like a platform or only like a cluster running containers. The API server is the central gate, so the most important policy decisions should be enforced there or directly around it.
Components
| Component | Production concern |
|---|---|
| API server | Authn/authz, admission, rate limits, audit, API availability. |
| etcd | Encryption, backup, compaction, quorum, restore drill. |
| Scheduler | Placement, fairness, priority, topology awareness. |
| Controller manager | Reconcile correctness, stuck finalizers, leader election. |
| Cloud controller manager | Load balancer, node lifecycle, route and volume integration. |
API design principles
- Treat Kubernetes manifests as public interfaces. A breaking change in labels, selectors, probes, or PVC names can be as serious as a code API change.
- Custom resources should have clear status conditions. If a CRD cannot explain progress, failure, and observed generation, operators become hard to debug.
- Admission policy should reject unsafe intent early: privileged pods, missing resource requests, mutable image tags, and broad host access.
Failure modes
- API server overload: excessive controllers, chatty clients, or large list calls. Watch cache and client-side rate limiting matter.
- etcd pressure: too many events, large objects, or stale compaction can slow cluster-wide operations.
- Admission outage: webhook failure policy can block all workload changes if not designed carefully.
- Controller storm: bad reconciliation can create repeated writes and amplify API pressure.
Operating signals
- API server request latency by verb and resource.
- Admission webhook latency and rejection rate.
- etcd fsync duration, leader changes, database size.
- Controller work queue depth and retry rate.