I ran into a segfault-ing web application while attempting to get Beaker server working on Fedora 17. Beaker uses TurboGears for the web application using mod_wsgi to talk to Apache (httpd-2.2). (I am a little new to the web application world, so ignore any misuse of terms).
The particular use case that triggered the crash was uploading a task RPM to the Beaker server. That was some hint into where to look for the faulting code. The task RPM was being saved fine on the disk, so there was no problem in writing (permissions, etc). So, there had to be a problem in reading the RPM file (which is what Beaker does to retrieve the files and task specifications).
To investigate whether it was a problem reading the RPM file, I simply copied the relevant code from Beaker sources to a standalone Python script. No problems there. Hence, there had to be something wrong with the combination of using the rpm libraries (rpm-libs) together with the web application (httpd).
Debugging httpd with gdb
After futile efforts of inserting exception handling codes combined with logging in the code to bracket the code which was causing the crash, I decided to bite the bullet and just go the gdb way. Turns out it was simpler than I thought. Run ‘httpd’ in single process mode, and then monitor the error_log file for the wsgi daemon process ID and then attach to it using gdb in another terminal.
Start httpd in single process mode:
# gdb /usr/sbin/httpd (gdb) run -X
Then attach gdb to the wsgi daemon process (You should ensure via your wsgi configuration to create only one mod_wsgi daemon process with a single thread):
# gdb /usr/sbin/httpd attach (gdb) cont.
Once I had done this, I simply performed the action that triggered the crash, and I had a good stack trace, first few lines of which were:
#0 0x0000000100000001 in ?? () #1 0x00007fffd7fe1097 in db_init (dbhome=0x7fffdf7a9e70 "/var/lib/rpm", rdb=0x7fffdd052970) at backend/db3.c:151 #2 dbiOpen (rdb=rdb@entry=0x7fffdd052970, rpmtag=rpmtag@entry=0, dbip=dbip@entry=0x7fffe399ff38, flags=flags@entry=0) at backend/db3.c:551 #3 0x00007fffd7fe8e53 in rpmdbOpenIndex (db=db@entry=0x7fffdd052970, rpmtag=rpmtag@entry=0, flags=0) at rpmdb.c:149 #4 0x00007fffd7fe93ef in openDatabase (prefix=, dbpath=dbpath@entry=0x0, dbp=dbp@entry= 0x7fffdf797648, mode=mode@entry=0, perms=perms@entry=420, . . .
So that pretty much confirmed that it was something that httpd did not like about the rpm-libs which caused it to crash the application. Discussing with Dan, this was indeed a case of conflicting shared libraries and bit more of looking around we found that this was the Berkeley DB database library (libdb) that was the culprit. httpd had both libdb-4.8 and libdb-5.2 loaded in its process maps, also verified with ‘lsof‘ (Thanks to StackOverflow for the lsof tip):
# lsof /lib64/libdb-4.8.so COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME httpd 9722 apache mem REG 253,1 1555128 1189531 /usr/lib64/libdb-4.8.so httpd 10578 apache mem REG 253,1 1555128 1189531 /usr/lib64/libdb-4.8.so httpd 18828 apache mem REG 253,1 1555128 1189531 /usr/lib64/libdb-4.8.so httpd 18832 apache mem REG 253,1 1555128 1189531 /usr/lib64/libdb-4.8.so gdb 18863 root mem REG 253,1 1555128 1189531 /usr/lib64/libdb-4.8.so [root@asaha temp]# lsof /lib64/libdb-5.2.so COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME gdb 18824 root mem REG 253,1 1756808 1204315 /usr/lib64/libdb-5.2.so httpd 18832 apache mem REG 253,1 1756808 1204315 /usr/lib64/libdb-5.2.so gdb 18863 root mem REG 253,1 1756808 1204315 /usr/lib64/libdb-5.2.so qemu-kvm 30866 qemu mem REG 253,1 1756808 1204315 /usr/lib64/libdb-5.2.so
As you can see, both the libdb versions is mapped to httpd’s process space. Since, httpd itself depends only on libdb-4.8, there is something else in the web application which is bringing in libdb-5.2. That something turned out to be rpm-libs:
# yum deplist rpm-libs | grep libdb dependency: libdb-5.2.so provider: libdb.i686 5.2.36-5.fc17
So, that’s the problem and the reason for the crash.
There is no solution on Fedora 17 at this point of time to this other than trying to get httpd-2.4 which is linked against libdb-5.2.so. However, the Fedora 18 release, currently in development ships with httpd-2.4 which is linked against libdb-5.3.so, same as the rpm-libs version it is shipped with. And indeed, the above crash did not occur there.
To reproduce this with a minimal application, I taught myself how to integrate Flask with mod_wsgi and wrote this simple Flask application, which you can check out here. Follow the steps on the Flask docs for help.
Thanks to Dan on #fedora-qa and Dan in real life who helped me with debugging the crash and thanks to Graham Dumpleton for the mod_wsgi help on #pocoo. Here are some of the docs I referred to:
Related bug reports