Doing something before systemd shuts your supervisord down
If you are running your server applications via supervisord on a Linux distro running systemd, you may find this post useful.
Problem Scenario
An example scenario to help us establish the utility for this post is as follows:
systemd
starts the shutdown processsystemd
stopssupervisord
supervisord
stops your processes- You see in-flight requests being dropped
Solution
What we want to do is prevent in-flight requests being dropped when a system is shutting down as part of a power off cycle (AWS instance termination, for example). We can do so in two ways:
- Our server application is intelligent enough to not exit (and hence halt instance shutdown) if a request is in progress
- We hook into the shutdown process above so that we stop new requests from coming in once the shutdown process has started and give our application server enough time to finish doing what it is doing.
The first approach has more theoretical “guarantee” around what we want, but can be hard to implement correctly. In fact, I couldn’t get it right even after trying all sorts of signal handling tricks. Your mileage may vary of course and if you have an example you have, please let me know.
So, I went ahead with the very unclean second approach:
- Register a shutdown “hook” which gets invoked when
systemd
wants to stopsupervisord
- This hook takes the service instance out of the healthy pool
- The proxy/load balancer detects the above event and stops sending traffic
- As part of the “hook”, after we have gotten ourself out of the healthy service pool, we sleep for an arbitary time so that existing requests can finish
When you are using a software like linkerd as your RPC proxy, even long-lived connections are not a problem since
linkerd
will see that your service instance is unhealthy, so it will not proxy any more requests to it.
Proposed solution implementation
The proposed solution is a systemd unit - let’s call it drain-connections
which is defined as follows:
# cat /etc/systemd/system/drain-connections.service
[Unit]
Description=Shutdown hook to run before supervisord is stopped
After=supervisord.service networking.service
PartOf=supervisord.service
Conflicts=shutdown.target reboot.target halt.target
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/usr/local/bin/consul maint -enable
ExecStop=/bin/sleep 300
TimeoutSec=301
[Install]
WantedBy=multi-user.target
Let’s go over the key systemd directives used above in the Unit
section:
After
ensures thatdrain-connections
is started aftersupervisord
, but stopped beforesupervisord
PartOf
ensures thatdrain-connections
is stopped/restarted wheneversupervisord
is stopped/restarted
The Service
section has the following key directives:
Type=oneshot
(learn more about it here)- The first
ExecStop
first takes the service instance out of the pool by enablingconsul
maintenance mode - The second
ExecStop
then gives our application 300 seconds to stop finishing what it is currently doing - The
TimeoutSec
parameter overridesystemd
default timeout of 90 seconds to 301 seconds so that the earlier sleep of 300 seconds can finish
In addition, we setup supervisord
systemd unit override as follows:
# /etc/systemd/system/supervisord.service.d/supervisord.conf
[Unit]
Wants=drain-connections.service
This ensures that drain-connections
service gets started when supervisord
is started.
Discussion
Let’s see how the above fits in to our scenario:
systemd
starts the shutdown process and tries to stopsupervisord
- This triggerd
drain-connections
to be stopped where we have the commands we want to be executed - The above commands will take the instance out of the pool and sleep for an arbitrary period of time
drain-connections
finishes “stopping”systemd
stopssupervisord
- shutdown proceeds
What if drain-connections
is stopped first? That is okay, because that will execute the necessary commands
we would want to be executed. Then, supervisord
can be stopped which will stop our application server, but
the drain-connections
unit has already done its job by that time.