Doing something before systemd shuts your supervisord down
If you are running your server applications via supervisord on a Linux distro running systemd, you may find this post useful.
Problem Scenario
An example scenario to help us establish the utility for this post is as follows:
systemdstarts the shutdown processsystemdstopssupervisordsupervisordstops your processes- You see in-flight requests being dropped
Solution
What we want to do is prevent in-flight requests being dropped when a system is shutting down as part of a power off cycle (AWS instance termination, for example). We can do so in two ways:
- Our server application is intelligent enough to not exit (and hence halt instance shutdown) if a request is in progress
- We hook into the shutdown process above so that we stop new requests from coming in once the shutdown process has started and give our application server enough time to finish doing what it is doing.
The first approach has more theoretical “guarantee” around what we want, but can be hard to implement correctly. In fact, I couldn’t get it right even after trying all sorts of signal handling tricks. Your mileage may vary of course and if you have an example you have, please let me know.
So, I went ahead with the very unclean second approach:
- Register a shutdown “hook” which gets invoked when
systemdwants to stopsupervisord - This hook takes the service instance out of the healthy pool
- The proxy/load balancer detects the above event and stops sending traffic
- As part of the “hook”, after we have gotten ourself out of the healthy service pool, we sleep for an arbitary time so that existing requests can finish
When you are using a software like linkerd as your RPC proxy, even long-lived connections are not a problem since
linkerd will see that your service instance is unhealthy, so it will not proxy any more requests to it.
Proposed solution implementation
The proposed solution is a systemd unit - let’s call it drain-connections which is defined as follows:
# cat /etc/systemd/system/drain-connections.service
[Unit]
Description=Shutdown hook to run before supervisord is stopped
After=supervisord.service networking.service
PartOf=supervisord.service
Conflicts=shutdown.target reboot.target halt.target
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/usr/local/bin/consul maint -enable
ExecStop=/bin/sleep 300
TimeoutSec=301
[Install]
WantedBy=multi-user.target
Let’s go over the key systemd directives used above in the Unit section:
Afterensures thatdrain-connectionsis started aftersupervisord, but stopped beforesupervisordPartOfensures thatdrain-connectionsis stopped/restarted wheneversupervisordis stopped/restarted
The Service section has the following key directives:
Type=oneshot(learn more about it here)- The first
ExecStopfirst takes the service instance out of the pool by enablingconsulmaintenance mode - The second
ExecStopthen gives our application 300 seconds to stop finishing what it is currently doing - The
TimeoutSecparameter overridesystemddefault timeout of 90 seconds to 301 seconds so that the earlier sleep of 300 seconds can finish
In addition, we setup supervisord systemd unit override as follows:
# /etc/systemd/system/supervisord.service.d/supervisord.conf
[Unit]
Wants=drain-connections.service
This ensures that drain-connections service gets started when supervisord is started.
Discussion
Let’s see how the above fits in to our scenario:
systemdstarts the shutdown process and tries to stopsupervisord- This triggerd
drain-connectionsto be stopped where we have the commands we want to be executed - The above commands will take the instance out of the pool and sleep for an arbitrary period of time
drain-connectionsfinishes “stopping”systemdstopssupervisord- shutdown proceeds
What if drain-connections is stopped first? That is okay, because that will execute the necessary commands
we would want to be executed. Then, supervisord can be stopped which will stop our application server, but
the drain-connections unit has already done its job by that time.