Graceful Shutdowns: Surviving Spot Interruptions
Spot Instances are cheap until they break your app. Here is the boilerplate code you need to handle SIGTERM and preStop hooks correctly.
In my post on Spot Instances, I warned about the “Interruption Tax.” The only way to pay that tax without going bankrupt is Graceful Shutdowns.
When AWS reclaims a Spot node, it gives you a 2-minute warning. Kubernetes translates this into a SIGTERM signal sent to your pod.
If your app ignores SIGTERM, it gets SIGKILLed 30 seconds later. In-flight requests fail. Database connections leak. Customers get 500 errors.
Here is how to fix it.
The Lifecycle
- Spot Reclaim: AWS notifies the node.
- Node Drain: The node cordon/drains itself.
- Pod Termination: Kubernetes sends
SIGTERMto your app. - Service Removal: Simultaneously, Kubernetes removes the pod IP from the Service/Ingress endpoints.
The Race Condition
Step 3 and Step 4 happen at the same time. This is the problem.
Your app might receive the SIGTERM and shut down before the load balancer stops sending it traffic. Result: Dropped requests.
The Fix: preStop Sleep
You need to tell your app to wait for the load balancer to update. The simplest way is a preStop hook.
lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 10"]This forces the pod to stay alive for 10 seconds after the termination starts, giving the load balancer time to propagate the change.
The Code: Handling SIGTERM
In your application code (Go example), you must catch the signal and drain connections.
sigChan := make(chan os.Signal, 1)signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
<-sigChan // Wait for signallog.Println("Shutting down...")
// Stop accepting NEW requestsserver.Shutdown(ctx)
// Finish OLD requestswaitForJobsToFinish()Summary
Spot instances are only “production ready” if your app is “interruption ready.”
- Add a
preStopsleep (10-15s). - Catch
SIGTERM. - Drain connections gracefully.
Do this, and you can save 90% on compute without waking up at 3 AM.
Jesus Paz
Founder & CEO
Join 1,000+ FinOps and platform leaders
Get Kubernetes and ECS cost tactics delivered weekly.