Bits from /home/gene

echo $RANDOM

Book Review: Linux System Programming

I received a review copy of the book as part of the blogger review program.

I review for the O'Reilly Blogger Review Program

Title:  “Linux System Programming (2nd Edition) ” by Robert Love; O’Reilly Media.

Summary:  

This book consists of 11 chapters. The first chapter introduces you nicely to the the core topics and lays the foundation for the rest of the book. Files (including some hints on the role of the virtual file system and how they are represented in the Kernel), Input/Output (User buffered I/O, I/O scheduling, Scatter-Gather I/O), Processes (including their creation mechanisms and management), Threads (and how Linux implements them along with a treatment of the POSIX threads library), Memory (Process address space, dynamic memory allocation strategies, and how they work, memory locking) form the core of the book. The second last chapter discusses signal handling. The last chapter of the book is on time (the different types of time, how you can get/set time, measure time elapsed and timers) and is sort of a “standalone” topic for the book. The first appendix discusses the GCC extensions to the C language and can be handy when you read the Kernel source code.

Reactions:

In this book, the author discusses some of the most important topics that one would want to learn about when venturing into the area of “system programming” on Linux. He introduces the topics in a friendly manner adding some fun anecdotes from time to time (what does the “c” in calloc() stand for?).At various places, the reader is given a peek under the hood (for example, pause() is one of the simplest system calls implemented) which can only make the curious reader happy and itchy to download the kernel source code and start grepping. The book includes code examples throughout and hence if you are learning a topic for the first time, these are very useful starting points.

Verdict: 

System programming on Linux is an area encompassing number of related topics most of which can fill up whole books on their own. I also could not help comparing this book with “The Linux Programming Interface” by Michael Kerrisk (a book which I own already). Should you buy this book if you already own the latter? Yes, you should. While not being “encyclopedic” and not covering topics such as socket programming at all, Robert Love’s “Linux System Programming” has the right level of treatment and detail for the reader interested in system programming.

Product page:

Fedora 20 Scientific Released

Fedora 20 is now released, which also means the newest release of Fedora Scientific along with other spins are also available.

Download

You can download the spin images from here.

What’s new in Fedora Scientific

The notable additions in this release are:

  • Sage, along with Sage notebook.
  • SymPy, the Python library for Symbolic mathematics
  • The Python 3 versions for scipy, numpy, matplotlib libraries and IPython (including IPython notebook)
  • Commons math, a Java library for numerical computing

The Fedora 20 release notes are here.

Fedora Scientific Documentation

I started work on some documentation for Fedora Scientific about a month or so back. It is far from what I want it to be, but you can see the current version here. The first goal that I have in mind is to document all the major scientific tools and libraries that are shipped with Fedora Scientific. By document, I imply links to the official project resources and guides. The second goal is to actually add original content and make it a guide book for Fedora Scientific which may be used as an entry point for Open Source Scientific Computing. Once the guide has taken some shape, an RPM package can be created and distributed with Fedora Scientific so that the entire documentation is available for offline perusal.

Contributing

The Fedora Scientific documentation is an excellent starting point if you are looking to make a contribution to Fedora Scientific. You can view the project here.

If you have some toy/throwaway scripts that makes use of one of the libraries/tools, you may want to contribute it to the
“tests” here. They will help sanity check these libraries and tools during the development of upcoming Fedora Scientific releases.

Discussions and support

Please join the Fedora scitech mailing list.

Suggetions and ideas? Please leave a comment.

C-x C-p for import pdb; pdb.set_trace()

I worked through most of the Learn Emacs Lisp in 15 minute tutorial and learned enough to write something useful:

;; Inset import pdb; pdb.set_trace() on C-x, C-p
(defun pdb-set-trace ()
  ;; http://www.emacswiki.org/emacs/InteractiveFunction
  (interactive)
  (insert "import pdb; pdb.set_trace()\n"))
(global-set-key [(control ?x) (control ?p)] 'pdb-set-trace)

Once you place this function in your Emacs init file, when you press the keys C-x and C-p, import pdb; pdb.set_trace() will magically appear in your Python program (well, anywhere for that matter, since there is no check in place). This is something which I know will be very useful to me. If you don’t like the key combination, you can change it in the last line of the above function.

If you don’t know Emacs lisp, you should try to work through the tutorial. Although, I must say I did make various attempts long time back to learn Common Lisp and Clojure, so it was not so unfamiliar to me.

poweroff, halt, reboot and systemctl

On Fedora (and perhaps other Linux distros using systemd) you will see that the poweroff, reboot and halt commands are all symlinks to systemctl:

> ls -l /sbin/poweroff /sbin/halt /sbin/reboot 
lrwxrwxrwx. 1 root root 16 Oct  1 11:04 /sbin/halt -> ../bin/systemctl
lrwxrwxrwx. 1 root root 16 Oct  1 11:04 /sbin/poweroff -> ../bin/systemctl
lrwxrwxrwx. 1 root root 16 Oct  1 11:04 /sbin/reboot -> ../bin/systemctl

So, how does it all work? The answer lies in this code block from systemctl.c:

..
5556        if (program_invocation_short_name) {                                                                                                                                              
5557                                                                                                                                                                                          
5558                if (strstr(program_invocation_short_name, "halt")) {                                                                                                                      
5559                        arg_action = ACTION_HALT;                                                                                                                                         
5560                        return halt_parse_argv(argc, argv);                                                                                                                               
5561                } else if (strstr(program_invocation_short_name, "poweroff")) {                                                                                                           
5562                        arg_action = ACTION_POWEROFF;                                                                                                                                     
5563                        return halt_parse_argv(argc, argv);                                                                                                                               
5564                } else if (strstr(program_invocation_short_name, "reboot")) {                                                                                                             
5565                        if (kexec_loaded())                                    
..

program_invocation_short_name
program_invocation_short_name is a variable (GNU extension) which contains the name used to invoke a program. The short indicates that if you call your program as /bin/myprogram, it is set to ‘myprogram’. There is also a program_invocation_name variable consisting of the entire path. Here is a demo:


/*myprogram.c*/

# include <stdio.h>

extern char *program_invocation_short_name;
extern char *program_invocation_name;

int main(int argc, char **argv)
{
  printf("%s \n", program_invocation_short_name);
  printf("%s \n", program_invocation_name);
  return 0;
}

Assume that the executable for the above program is created as myprogram, execute the program from a directory which is one level up from where it resides. For example, in my case, myprogram is in $HOME/work and I am executing it from $HOME:

> ./work/myprogram 
myprogram 
./work/myprogram 

You can see the difference between the values of the two variables. Note that any command line arguments passed are not included in any of the variables.

Back to systemctl

Okay, so now we know that when we execute the poweroff command (for example), program_invocation_short_name is set to poweroff and this check matches:

if (strstr(program_invocation_short_name, "poweroff")) 
..

and then the actual action of powering down the system takes place. Also note that how the halt_parse_argv function is called with the parameters argc and argv so that when you invoke the poweroff command with a switch such as --help, it is passed appropriately to halt_parse_argv:

5194        static const struct option options[] = {                                                                                                                                          
5195                { "help",      no_argument,       NULL, ARG_HELP    },                                                                                                                    
..
..
5218                case ARG_HELP:                                                                                                                                                            
5219                        return halt_help(); 

Fooling around

Considering that systemctl uses strstr to match the command it was invoked as, it allows for some fooling around. Create a symlink mypoweroff to /bin/systemctl and then execute it as follows:

> ln -s /bin/systemctl mypoweroff
> ./mypoweroff --help
mypoweroff [OPTIONS...]

Power off the system.

     --help      Show this help
     --halt      Halt the machine
  -p --poweroff  Switch off the machine
     --reboot    Reboot the machine
  -f --force     Force immediate halt/power-off/reboot
  -w --wtmp-only Don't halt/power-off/reboot, just write wtmp record
  -d --no-wtmp   Don't write wtmp record
     --no-wall   Don't send wall message before halt/power-off/reboot

This symlink is for all purpose going to act like the poweroff command since systemctl basically checks whether ‘poweroff’ is a substring of the invoked command.

To learn more, see systemctl.c

Related

Few months back, I had demoed invoking a similar behaviour in programs where a program behaves differently based on how you invoke it using argv[0] here. I didn’t know of the GNU extensions back then.

Writing git hooks using Python

Since git hooks can be any executable script with an appropriate #! line, Python is more than suitable for writing your git hooks. Simply stated, git hooks are scripts which are called at different points of time in the life cycle of working with your git repository.

Let’s start by creating a new git repository:


~/work> git init git-hooks-exp
Initialized empty Git repository in /home/gene/work/git-hooks-exp/.git/
~/work> cd git-hooks-exp/
~/work/git-hooks-exp (master)> tree -al .git/
.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
│   ├── post-update.sample
│   ├── pre-applypatch.sample
│   ├── pre-commit.sample
│   ├── prepare-commit-msg.sample
│   ├── pre-rebase.sample
│   └── update.sample
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

9 directories, 12 files

Inside the .git are a number of directories and files, one of them being hooks/ which is where the hooks live. By default, you will have a number of hooks with the file names ending in .sample. They may be useful as starting points for your own scripts. However, since they all have an extension .sample, none of the hooks are actually activated. For a hook to be activated, it must have the right file name and it should be executable. Let’s see how we can write a hook using Python.

We will write a post-commit hook. This hook is called immediately after you have made a commit. We are going to do something fairly useless, but quite interesting in this hook. We will take the commit SHA1 of this commit, and print how it may look like in a more human form. I do the latter using the humanhash module. You will need to have it installed.

Here is how the hook looks like:


#!/usr/bin/python

import subprocess
import humanhash

# get the last commit SHA and print it after humanizing it
# https://github.com/zacharyvoase/humanhash
print humanhash.humanize(
    subprocess.check_output(
        ['git','rev-parse','HEAD']))

I use the subprocess.check_output() function to execute the command git rev-parse HEAD so that I can get the commit SHA1 and then call the humanhash.humanize() function with it.

Save the hook as a file, post-commit in your hooks/ directory and make it executable using chmod +x .git/hooks/post-commit. Let’s see the hook in action:

~/work/git-hooks-exp (master)> touch file
~/work/git-hooks-exp (master)> git add file
~/work/git-hooks-exp (master)> git commit -m "Added a file"
carbon-network-connecticut-equal
[master (root-commit) 2d7880b] Added a file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file

The commit SHA1 for the commit turned out to be 2d7880be746a1c1e75844fc1aa161e2b8d955427. Let’s check it with the humanize function and check if we get the same message as above:

>>> humanhash.humanize('2d7880be746a1c1e75844fc1aa161e2b8d955427')
'carbon-network-connecticut-equal'

And you can see the same message above as well.

Accessing hook parameters

For some of the hooks, you will see that they are called with some parameters. In Python you can access them using the sys.argv attribute from the sys module, with the first member being the name of the hook of course and the others will be the parameters that the hook is called with.

Current working directory

For some reason, it may be useful if you know what is the current working directory of your hook. The os.getcwd() function can help there and it turns out to be the local file system path to your
git repository (~/work/git-hooks-exp in the above case).

Using the __cleanup__ variable attribute in GCC

GCC’s C compiler allows you to define various variable attributes. One of them is the cleanup attribute (which you can also write as__cleanup__) which allows you to define a function to be called when the variable goes out of scope (for example, before returning from a function). This is useful, for example to never forget to close a file or freeing the memory you may have allocated. Next up is a demo example defining this attribute on an integer variable (which obviously has no practical value). I am using gcc (GCC) 4.7.2 20121109 on Fedora 18.

Read the article here.

Managing IPython notebook server via systemd: Part-I

If you are using IPython notebook on a Linux distribution which uses systemd as it’s process manager (such as Fedora Linux, Arch Linux) , you may find this post useful. I will describe a fairly basic configuration to manage (start/stop/restart) IPython notebook server using systemd.

Creating the Systemd unit file

First, we will create the systemd unit file. As root user, create a new file /usr/lib/systemd/system/ipython-notebook.service and copy the following contents into it:

[Unit]
Description=IPython notebook

[Service]
Type=simple
PIDFile=/var/run/ipython-notebook.pid
ExecStart=/usr/bin/ipython notebook --no-browser --pylab=inline
User=ipynb
Group=ipynb
WorkingDirectory=/home/ipynb/notebooks

[Install]
WantedBy=multi-user.target

Note that due to the naming of our unit file, the service will run as ipython-notebook. To completely understand the above unit file, you will need to read up a little of the topic. You may find my earlier post useful which also has links to systemd resources. Three things deserve explanation though:

The line, ExecStart=/usr/bin/ipython notebook --no-browser --pylab=inline specifies the command to start the IPython notebook server. This should be familiar to someone who uses it.

The lines, User=ipynb and Group=ipynb specify that we are going to run this process as user/group ipynb (we create them in the next step).

The line WorkingDirectory=/home/ipynb/notebooks specify that the notebooks will be stored/server in/from /home/ipynb/notebooks

Setting up the user

As root, create the user ipynb:

# useradd ipynb

Next, as ipynb, create a sub-directory, notebooks:


# su - ipynb
[ipynb@localhost ~]$ mkdir notebooks
[ipynb@localhost ~]$ exit

Starting IPython notebook

We are all set now to start IPython notebook. As the root user, reload all the systemd unit files, enable the ipython-notebook service so that it starts on boot, and then start the service:


# systemctl daemon-reload
# systemctl enable ipython-notebook
ln -s '/usr/lib/systemd/system/ipython-notebook.service' '/etc/systemd/system/multi-user.target.wants/ipython-notebook.service'
# systemctl start ipython-notebook

If you check the status of the service, it should show the following:

# systemctl status ipython-notebook
ipython-notebook.service - IPython notebook
Loaded: loaded (/usr/lib/systemd/system/ipython-notebook.service; enabled)
Active: active (running) since Sun 2013-09-22 22:39:59 EST; 23min ago
Main PID: 3671 (ipython)
CGroup: name=systemd:/system/ipython-notebook.service
├─3671 /usr/bin/python /usr/bin/ipython notebook --no-browser --pylab=inline
└─3695 /usr/bin/python -c from IPython.zmq.ipkernel import main; main() -f /home/ipynb/.ipython/profile_default/security/kernel-6dd8b338-e779-4e67-bf25-1cd238...

Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] Serving notebooks from /home/ipynb/notebooks
Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] The IPython Notebook is running at: http://127.0.0.1:8888/
Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] Use Control-C to stop this server and shut down all kernels.
Sep 22 22:40:21 localhost ipython[3671]: [NotebookApp] Using MathJax from CDN: http://cdn.mathjax.org/mathjax/latest/MathJax.js
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Kernel started: 6dd8b338-e779-4e67-bf25-1cd23884cf5a
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Connecting to: tcp://127.0.0.1:51666
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Connecting to: tcp://127.0.0.1:52244
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Connecting to: tcp://127.0.0.1:44667
Sep 22 22:40:22 localhost ipython[3671]: [IPKernelApp] To connect another client to this kernel, use:
Sep 22 22:40:22 localhost ipython[3671]: [IPKernelApp] --existing kernel-6dd8b338-e779-4e67-bf25-1cd23884cf5a.json

You should now be able to access IPython notebook as you would normally do. Finally, you can stop the server as follows:


# systemctl stop ipython-notebook

The logs are redirected to /var/log/messages:


Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] Created profile dir: u'/home/ipynb/.ipython/profile_default'
Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] Serving notebooks from /home/ipynb/notebooks
Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] The IPython Notebook is running at: http://127.0.0.1:8888/
Sep 22 22:39:59 localhost ipython[3671]: [NotebookApp] Use Control-C to stop this server and shut down all kernels.
Sep 22 22:40:21 localhost ipython[3671]: [NotebookApp] Using MathJax from CDN: http://cdn.mathjax.org/mathjax/latest/MathJax.js
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Kernel started: 6dd8b338-e779-4e67-bf25-1cd23884cf5a
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Connecting to: tcp://127.0.0.1:51666
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Connecting to: tcp://127.0.0.1:52244
Sep 22 22:40:22 localhost ipython[3671]: [NotebookApp] Connecting to: tcp://127.0.0.1:44667
Sep 22 23:05:35 localhost ipython[3671]: [NotebookApp] received signal 15, stopping
Sep 22 23:05:35 localhost ipython[3671]: [NotebookApp] Shutting down kernels
Sep 22 23:05:35 localhost ipython[3671]: [NotebookApp] Kernel shutdown: 6dd8b338-e779-4e67-bf25-1cd23884cf5a

For me, the biggest reason to do this is that I do not have to start the IPython notebook server everytime on system startup manually, since I know it will be running when I need to use it. I plan to explore managing custom profiles next and also think more about a few other things.

Fedora Scientific Spin Update

The Fedora Scientific 20 Spin will have a number of new packages: notably, it will now include the Python 3 tool chain for the Python libraries for scientific/numerical computing. If you are interested to check it out, download a nightly from here.

Testing the applications/libraries

It took me 3 releases to figure out – or rather, sit down to do it. Anyway, here it is now. I have created a Wiki page where I want to list scripts/other ways to sanity test the various packages/applications that are being shipped. I believe it will help in two ways:

  • Often, the entire functionality of a tool/library is split into more than one package and hence pulling in only the main package is no guarantee that the tool/library/application will work.
  • We may also be able to catch genuine bugs/faults in the packages being shipped. (Think of things like missing shared library dependency, etc)

So, please add whatever you can to the wiki page. Just simple ways to see if the application/library actually works.

My plan is to collect whatever I/we gather there into a git repository somewhere and run it prior to/during releases to see everything is working as expected.

Links

  • Download a nightly compose from here (The usual warning against testing in-progress Fedora releases apply)
  • Add scripts/tests to the wiki page here

If you find a problem, leave a comment here, or add to the wiki page.

Note

You may see that the nightlies are failing, but this should be fixed soon (See this bug). You can still download a TC5 build from here which has all the packages that are going to be shipped, except for sagemath.

Once again, thanks are due to the packagers actually packaging all this software that makes Fedora Scientific possible.

Get started with Beaker on Fedora

Beaker 0.14 was released recently and if you are an existing user of Beaker, you may see the What’s new page here

If however, you do not know what Beaker is, the Architecture guide is a good start and if things look interesting, with this release there is also documentation now to setup a Beaker “test bed” using two Virtual machines (via libvirt). 

Notes on writing systemd unit files for Beaker’s daemon processes

Recently, I had a chance to write systemd unit files for the daemon processes that run as part of Beaker: beakerd which is the scheduling daemon running on the server and the four daemons running on the lab controller – beaker-proxy, beaker-provision, beaker-watchdog and beaker-transfer.

This post may be of interest to you if you are using python-daemon to write programs which are capable of running as daemon processes and you want to write systemd unit files for them.

beakerd’s unit file

Here is the systemd unit file for beakerd, which I will use to illustrate the core points of this post. The other unit files are similar, and hence I will explain only where they differ from this one:

[Unit]
Description=Beaker scheduler
After=mysqld.service

[Service]
Type=forking
PIDFile=/var/run/beaker/beakerd.pid
ExecStart=/usr/bin/beakerd
User=apache
Group=apache

[Install]
WantedBy=multi-user.target

The [Unit] section has a description of the service (using the Description option) and specifies that it should start after the mysqld.service has started using the After option. beakerd needs to communicate to a MySQL server before it can start successfully. It can work with a local or a remote MySQL server. Hence, specifying After sets up an ordering that if there is a local MySQL server, then wait for it to start before starting beakerd. Using Requires is not suitable here to accommodate the possibility that beakerd may be configured to use a remote MySQL server.

In the [Service] section, the Type is set to Forking. This is because, beakerd uses python-daemon which forks itself (detaches itself) during the daemonization. However, you must ensure that when creating a DaemonContext() object, you should specify detach_process=True. This is because, if python-daemon detects that it is running under a init manager, it doesn’t detach itself unless the keyword is explicitly set to True, as above (you can see the code in daemon.py). Hence, although not setting the above keyword would work under SysV Init, it doesn’t work under systemd (with Type=Forking), since the daemon doesn’t fork at all and systemd expects it to fork (and finally kills it). The PIDFile specifies where the process ID is dumped by beakerd and is setup while creating the DaemonContext object as follows and ExecStart specifies the location to the binary that is to be started.

The beakerd process is to be run as the apache user and group, which is specified by the User and Group options.

In the [Install] section, the WantedBy option specifies when the beakerd process should be started (similar to the concept of “run levels” in SysV init). systemd defines several targets, and here we define that we want beakerd to start as part of the multi user setup.

That’s all about beakerd’s unit file.

beaker-provision’s unit file

beaker-provision and the other daemons running on the lab controller have similar unit files:

[Unit]
Description=Beaker provisioning daemon
After=httpd.service

[Service]
Type=forking
PIDFile=/var/run/beaker-lab-controller/beaker-provision.pid
ExecStart=/usr/bin/beaker-provision
User=root
Group=root

[Install]
WantedBy=multi-user.target

All the four lab controller daemons need to communicate with Beaker web application – which can be local or remote, and hence the After option specifies the dependency on httpd.service. And, this particular daemon runs as root user/group, which is specified by the User and group options.

And everything else is similar to beakerd’s unit file and also the other lab controller daemons.

Shipping SysV init files and systemd unit files in the same package

The beaker packages ship both SysV init files and systemd unit files now so that it can use systemd when available, but use SysV init otherwise. This commit can give you some idea of how to go about it.

systemd resources

These links proved helpful to learn more about systemd, including how to package unit files for Fedora:

Follow

Get every new post delivered to your Inbox.

Join 66 other followers