AI Rightsizing for Kubernetes: Start with the Boring Baseline
Before you trust ML to resize pods, fix your signals, budgets, and guardrails. Otherwise AI just automates bad guesses.
Stop guessing your CPU and RAM requests. A data-driven guide to right-sizing your pods without killing performance.
In my previous post about CPU limits, I argued that limits are often dangerous. But if you remove limits, your Requests become the most critical configuration in your cluster.
Requests are the contract you sign with the Kubernetes scheduler. Set them too high, and you waste money (the “Slack Tax”). Set them too low, and your nodes get oversubscribed, leading to CPU contention and OOM kills.
Most teams guess. They look at a graph, squint, and pick a number.
There is a better way.
Kubernetes workloads are rarely flat lines. They are spiky.
If you size for the peak, you waste 80% of your money. If you size for the average, you crash during the peak.
The art of sizing is finding the percentile that balances risk and cost.
The Vertical Pod Autoscaler (VPA) has a “Recommendation” mode. It watches your pods and tells you what it would set the requests to.
Install it. Let it run for a week. It gives you a baseline grounded in reality, not guesswork.
kubectl get vpa my-app -o yamlLook at the target recommendations. That is your new baseline.
For customer-facing services, I recommend setting CPU requests to the 95th percentile of usage during your busiest hour.
Why not 100%? Because CPU is compressible. If you burst above your request for a few seconds, it’s fine. You steal cycles from low-priority neighbors.
For Memory, however, you must size for the 100th percentile + buffer. RAM is not compressible. If you run out, you die (OOM).
Don’t just trust production traffic. Run a load test.
Throttling metrics (if you still have limits) or CPU Usage.If latency stays flat, your requests are good. If latency spikes, your requests are too low (the node is too crowded).
Right-sizing is not a one-time task. It’s a loop.
In the next post, we’ll talk about how to automate this loop so you never have to edit a YAML file again.
Founder & CEO
Before you trust ML to resize pods, fix your signals, budgets, and guardrails. Otherwise AI just automates bad guesses.
You don't need AI-driven anomaly detection to save money. You need to fix your Node Rightsizing and Requests. Here is the strategy.
Get Kubernetes and ECS cost tactics delivered weekly.