AI Rightsizing for Kubernetes: Start with the Boring Baseline

Before you trust ML to resize pods, fix your signals, budgets, and guardrails. Otherwise AI just automates bad guesses.

J
Jesus Paz
1 min read

Everyone wants AI to rightsize pods automatically. That only works if your inputs are clean and your teams trust the output. Here is how to make “AI rightsizing” boring and reliable.

Fix the inputs first

  • Golden signals: Capture p95 CPU/Memory usage, request/limit ratios, and OOM/restart counts per workload. If you cannot see the waste, no model helps.
  • Steady price sheet: Use a single source for node, storage, and egress pricing. ML tuned on stale prices is noise.
  • Labels everywhere: Owner, team, env, service. Rightsizing without ownership leads to ignored recommendations.

Start with deterministic rules

  • Flag pods with requests > 2x p95 for 7 days straight.
  • Block deployments with missing limits or with limits above node capacity.
  • Auto-open tickets for top 10 wasteful workloads weekly.

These guardrails build trust and clean data before adding ML.

Layer AI carefully

  • Train on workloads with stable traffic; exclude noisy batch and experiments.
  • Optimize for cost + SLO: never propose settings that raise error rate or tail latency.
  • Suggest a range (min/target/max) instead of a single value so humans can choose safer defaults.

Close the loop in CI/CD

  • Post recommendations as PR comments with dollar impact.
  • Let developers accept via label (/apply-rightsize) that triggers a patch on the manifest.
  • Track acceptance rate and rollback rate; pause models that regress SLOs.

When to trust it

  • You have 30–60 days of stable usage per service.
  • Teams already follow limits/requests conventions.
  • Acceptance-to-rollback ratio stays above 4:1.

AI rightsizing is not magic. It is a thin layer on top of clean telemetry, sane policies, and fast feedback loops. Nail those first; the AI will look smart because the system is.***

👨‍💻

Jesus Paz

Founder & CEO

Read Next

Join 1,000+ FinOps and platform leaders

Get Kubernetes and ECS cost tactics delivered weekly.