Understanding Kubernetes Resource Requests vs Limits (and Why They Affect Your Bill)

Requests keep the cluster stable, limits prevent noisy neighbors, and both drive cost—here’s how to tune them.

J
Jesus Paz
1 min read

Requests and limits look simple, but they control scheduling, reliability, and ultimately how much you pay AWS. Let’s demystify them with a cost lens.

Requests = reserved capacity

  • Kubernetes guarantees the requested CPU/memory for each pod.
  • Cluster autoscaler scales nodes based on aggregate requests, not usage.
  • Translation: high requests → larger node footprint → higher cost.

Limits = safety rails

  • CPU limits throttle workloads when they exceed the boundary.
  • Memory limits cause OOM kills if breached.
  • Translation: overly tight limits crash apps; no limits create noisy neighbors.

Cost implications

ScenarioResult
Requests >> usage, limits equal requestsWasteful spend, thrashing autoscalers
Requests = usage, limits slightly higherBalanced utilization
No limitsPotential runaway pods impacting other workloads

ClusterCost highlights pods where requests are 2× actual usage so you can trim safely.

Tuning workflow

  1. Gather P95 usage per pod over 7–14 days.
  2. Set requests = P95 (rounded up).
  3. Set limits = requests × 1.2 (or more for bursty workloads).
  4. Automate PRs to apply the new values.

For latency-sensitive services, add SLO data before cutting requests.

Monitor continuously

  • Track CPU throttling and OOM events.
  • Watch cluster utilization: aim for 70–80% to leave failover headroom.
  • Use ClusterCost to alert when request-to-usage ratio drifts above thresholds.

Requests and limits are the knobs that turn Kubernetes from an expensive science project into a predictable platform. Tune them weekly, and your bill will reward you.***

👨‍💻

Jesus Paz

Contributor

Read Next

Join 1,000+ FinOps and platform leaders

Get Kubernetes and ECS cost tactics delivered weekly.