If you are running your server applications via supervisord on a Linux distro running systemd, you may find this post useful.
An example scenario to help us establish the utility for this post is as follows:
systemdstarts the shutdown process
supervisordstops your processes
- You see in-flight requests being dropped
What we want to do is prevent in-flight requests being dropped when a system is shutting down as part of a power off cycle (AWS instance termination, for example). We can do so in two ways:
- Our server application is intelligent enough to not exit (and hence halt instance shutdown) if a request is in progress
- We hook into the shutdown process above so that we stop new requests from coming in once the shutdown process has started and give our application server enough time to finish doing what it is doing.
The first approach has more theoretical “guarantee” around what we want, but can be hard to implement correctly. In fact, I couldn’t get it right even after trying all sorts of signal handling tricks. Your mileage may vary of course and if you have an example you have, please let me know.
So, I went ahead with the very unclean second approach:
- Register a shutdown “hook” which gets invoked when
systemdwants to stop
- This hook takes the service instance out of the healthy pool
- The proxy/load balancer detects the above event and stops sending traffic
- As part of the “hook”, after we have gotten ourself out of the healthy service pool, we sleep for an arbitary time so that existing requests can finish
When you are using a software like linkerd as your RPC proxy, even long-lived connections are not a problem since
linkerd will see that your service instance is unhealthy, so it will not proxy any more requests to it.
Proposed solution implementation
The proposed solution is a systemd unit - let’s call it
drain-connections which is defined as follows:
# cat /etc/systemd/system/drain-connections.service [Unit] Description=Shutdown hook to run before supervisord is stopped After=supervisord.service networking.service PartOf=supervisord.service Conflicts=shutdown.target reboot.target halt.target [Service] Type=oneshot RemainAfterExit=true ExecStart=/bin/true ExecStop=/usr/local/bin/consul maint -enable ExecStop=/bin/sleep 300 TimeoutSec=301 [Install] WantedBy=multi-user.target
Let’s go over the key systemd directives used above in the
drain-connectionsis started after
supervisord, but stopped before
drain-connectionsis stopped/restarted whenever
Service section has the following key directives:
Type=oneshot(learn more about it here)
- The first
ExecStopfirst takes the service instance out of the pool by enabling
- The second
ExecStopthen gives our application 300 seconds to stop finishing what it is currently doing
systemddefault timeout of 90 seconds to 301 seconds so that the earlier sleep of 300 seconds can finish
In addition, we setup
supervisord systemd unit override as follows:
# /etc/systemd/system/supervisord.service.d/supervisord.conf [Unit] Wants=drain-connections.service
This ensures that
drain-connections service gets started when
supervisord is started.
Let’s see how the above fits in to our scenario:
systemdstarts the shutdown process and tries to stop
- This triggerd
drain-connectionsto be stopped where we have the commands we want to be executed
- The above commands will take the instance out of the pool and sleep for an arbitrary period of time
- shutdown proceeds
drain-connections is stopped first? That is okay, because that will execute the necessary commands
we would want to be executed. Then,
supervisord can be stopped which will stop our application server, but
drain-connections unit has already done its job by that time.