AI Rightsizing for Kubernetes: Start with the Boring Baseline

Everyone wants AI to rightsize pods automatically. That only works if your inputs are clean and your teams trust the output. Here is how to make “AI rightsizing” boring and reliable.

Fix the inputs first

Golden signals: Capture p95 CPU/Memory usage, request/limit ratios, and OOM/restart counts per workload. If you cannot see the waste, no model helps.
Steady price sheet: Use a single source for node, storage, and egress pricing. ML tuned on stale prices is noise.
Labels everywhere: Owner, team, env, service. Rightsizing without ownership leads to ignored recommendations.

Start with deterministic rules

Flag pods with requests > 2x p95 for 7 days straight.
Block deployments with missing limits or with limits above node capacity.
Auto-open tickets for top 10 wasteful workloads weekly.

These guardrails build trust and clean data before adding ML.

Layer AI carefully

Train on workloads with stable traffic; exclude noisy batch and experiments.
Optimize for cost + SLO: never propose settings that raise error rate or tail latency.
Suggest a range (min/target/max) instead of a single value so humans can choose safer defaults.

Close the loop in CI/CD

Post recommendations as PR comments with dollar impact.
Let developers accept via label (/apply-rightsize) that triggers a patch on the manifest.
Track acceptance rate and rollback rate; pause models that regress SLOs.

When to trust it

You have 30–60 days of stable usage per service.
Teams already follow limits/requests conventions.
Acceptance-to-rollback ratio stays above 4:1.

AI rightsizing is not magic. It is a thin layer on top of clean telemetry, sane policies, and fast feedback loops. Nail those first; the AI will look smart because the system is.***

👨‍💻

Jesus Paz

Founder & CEO

Previous ← Quick Tip: Debugging CrashLoopBackOff Next Graceful Shutdowns: Surviving Spot Interruptions →

AI Rightsizing for Kubernetes: Start with the Boring Baseline

Fix the inputs first

Start with deterministic rules

Layer AI carefully

Close the loop in CI/CD

When to trust it

Jesus Paz

Read Next

GKE vs EKS Cost Comparison (2025): Which One is Cheaper?

A Developer’s Guide to Understanding Cloud Bills (AWS, GCP, Azure)

What I Learned Running Cost Monitoring for 50+ Kubernetes Clusters

Join 1,000+ FinOps and platform leaders