`idlexec`

Introduction

I wrote the idlexec utility to be able to temporarily and automatically stop calculation processes that run on our desktop machines, when someone uses a machine. Indeed some machines make a lot of noise when their processor is busy. This utility can take into account the fact that there can be several machines in a room.

It consists of a Perl script that depends only on the POSIX Perl module.

Usage

You first need to define an environment variable IDLEXECDIR pointing to a directory which will contain data for idlexec. This directory must be accessible by every machine of the room; it is typically under NFS, except if only one machine is concerned. In this directory, you need to create a file named hosts that contains the machines that are noisy when loaded: one name per line. For each room, you also need to create an empty file (the filename is typically the name of the room), for instance with the command touch, with write access enabled for the group of users who can use the machine (for calculations); in fact, the contents of these files are not important: only their modification date (mtime) will be used.

The command idlexec is used in the following three ways:

Before any other use, you need to run idlexec -g room, where room is the filename associated with the room (see above). This creates a symbolic link /tmp/.idlegroup that points to the file associated with the room. This step must be done on every machine of the room (not only on the noisy ones), and each time this symbolic link disappears (for instance, after a reboot of the machine, if the /tmp directory is cleaned up by the system).
An idlexec process with no arguments must run in the background on each machine, at least when someone is using the machine physically; for instance, it can be started at boot time (possibly with a sudo) or by the initialisation file of the user's shell. If the process is already running, an error is simply returned; it can be safely ignored. This process will update the modification date of the room file (via the symbolic link /tmp/.idlegroup) from time to time when the keyboard of the machine is used (it would be nice to take into account the mouse too, but I don't know how to do that).
To start a calculation process that must be controlled by idlexec, simply use the idlexec command with the arguments: the nice value, followed by the calculation command with its own arguments. The idlexec command roughly replaces the nice command (it accepts both positive and negative values, but takes the absolute value, so that you don't need to wonder which syntax you should use).

The idlexec process will look at the activity of the machine via the room file, with a 10-second polling. If the keyboard has been used within the previous 10 minutes, then a STOP signal will be sent to the process group of the command. A CONT signal will be sent after the keyboard has not been used for 10 minutes. Note: These times are configurable thanks to environment variables (for more information, see the source of idlexec).

The -n option can be given as the first argument to redirect the three standard streams from/to /dev/null. This option can be useful if the command is run via ssh and the process does not do the redirections necessary to detach itself from the terminal; otherwise the ssh command will be blocked until the process terminates.

Some More Technical Notes

To launch the command, idlexec first calls fork, then in the child: it calls nice, setsid, and exec on the command. If the command starts other processes, then the signals will be sent to all these processes, not just to the parent, under the condition that these processes do not call setsid, of course.
The idlexec script traps the signals INT (normally produced by a Ctrl+C) and TERM, and propagates them to the process group of the child. In particular, this allows to have the normal behavior of a Ctrl+C, despite the setsid done in the child process.
When the child process (calculation command) terminates (normally, by a crash, etc.), the return status is retrieved by idlexec, and the return code of idlexec is the one of the child process. If the child process was killed by a signal, then idlexec writes a line of the style:
```
Process pid killed by signal number
```
to the standard error stream; this line is obtained only if the -n option has not been used (since this option redirects the three standard streams from/to /dev/null).