Graceful Shutdowns: Surviving Spot Interruptions

Spot Instances are cheap until they break your app. Here is the boilerplate code you need to handle SIGTERM and preStop hooks correctly.

J
Jesus Paz
2 min read

In my post on Spot Instances, I warned about the “Interruption Tax.” The only way to pay that tax without going bankrupt is Graceful Shutdowns.

When AWS reclaims a Spot node, it gives you a 2-minute warning. Kubernetes translates this into a SIGTERM signal sent to your pod.

If your app ignores SIGTERM, it gets SIGKILLed 30 seconds later. In-flight requests fail. Database connections leak. Customers get 500 errors.

Here is how to fix it.

The Lifecycle

  1. Spot Reclaim: AWS notifies the node.
  2. Node Drain: The node cordon/drains itself.
  3. Pod Termination: Kubernetes sends SIGTERM to your app.
  4. Service Removal: Simultaneously, Kubernetes removes the pod IP from the Service/Ingress endpoints.

The Race Condition

Step 3 and Step 4 happen at the same time. This is the problem.

Your app might receive the SIGTERM and shut down before the load balancer stops sending it traffic. Result: Dropped requests.

The Fix: preStop Sleep

You need to tell your app to wait for the load balancer to update. The simplest way is a preStop hook.

lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 10"]

This forces the pod to stay alive for 10 seconds after the termination starts, giving the load balancer time to propagate the change.

The Code: Handling SIGTERM

In your application code (Go example), you must catch the signal and drain connections.

sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
<-sigChan // Wait for signal
log.Println("Shutting down...")
// Stop accepting NEW requests
server.Shutdown(ctx)
// Finish OLD requests
waitForJobsToFinish()

Summary

Spot instances are only “production ready” if your app is “interruption ready.”

  1. Add a preStop sleep (10-15s).
  2. Catch SIGTERM.
  3. Drain connections gracefully.

Do this, and you can save 90% on compute without waking up at 3 AM.

👨‍💻

Jesus Paz

Founder & CEO

Join 1,000+ FinOps and platform leaders

Get Kubernetes and ECS cost tactics delivered weekly.