CHANGELOG
Changelog for job manager programs. 
More info on http://bond.imm.dtu.dk/jobd

Current version is 1.68 Dec08,2000.

1-68-1
Dec 8
-----------------------------------------------------------
Fixed bug that caused the number of cpus on each node
not be be correctly detected in some cases.
Thanks to Cory Dikkers!

1.67-1
Nov 16
-----------------------------------------------------------
Fixed bug in reported elapsed time for queue jobs.
Fixed minor bugs for wakeup calls.
Fixed major bug in jobclientd causing the jobclientd to
reset all jobs on a node when a job is killed with jkill!

1.66-2
Nov 16
-----------------------------------------------------------
Changed wakeup scheme to avoid the "storm" of connections
when a user submits a large number of jobs and all clients
are woken up. Wakeup calls are now only made if 
the jobd server has been idle for 2 seconds. Two new fields
in jobd.conf: NUM_WAKEUPS and IDLE_BEFORE_WAKEUPCALLS 
control the behavior of sending wakeup calls.

1.66-1
Nov 10
-----------------------------------------------------------
Made jobd able to "wake up" jobclientd when a queue script
is ready. Speeds up execution of scripts quite a bit!
Important that both jobserv and jobclientd are 
version >=1.66

1.65-3
Nov 10
-----------------------------------------------------------
Changed jobclientd back to forcing bash shell when running
queue jobs. This caused jobclientd not to run the queued
jobs in some systems for reasons yet unknown (at least to me!)

1.65-2
Nov 08
-----------------------------------------------------------
Now you can define "subclusters" of nodes in 
/etc/jobd.conf and restrict your jobs to run only on the 
nodes in the subcluster with qr and jr.

Changed jobclientd to work better with kde2 (removed 
kdeinit: lines). Note the extra LUMP line in /etc/jobd.conf
(see the default jobd.conf included)

1.64-5
Oct 26
-----------------------------------------------------------
Minor code cleanup of jobclientd to correct "perl -w" 
warnings. Corrected the user name in "Dear..." line in 
mail when queue finishes.

1.64-4
Oct 25
-----------------------------------------------------------
Fixed bug in parsing of "-r" in qr, meant that the restrict
node for queue job option did not work when restricting to 
a single node.

1.64-3
Oct 20
-----------------------------------------------------------
Fixed bug in jobd which did not allow "-" in host names.
Fixed bug in jkill, so that entire running queues now can 
be killed.
Fixed bug in purgescratch.

1.64-1
Oct 16
-----------------------------------------------------------
Changed the internal rules for choosing the least loaded 
node. Different speed nodes is now supported: jobclientd uses 
the bogomips in /proc/cpuinfo (I guess this is a bad thing 
to do..) to get the node speed. The faster nodes are now
always preferred.

1.63-3
Oct 13
-----------------------------------------------------------
Minor bug fix for qwait. Now qwait displays time for each
job, and has options to be quiet.

1.63-1
Oct 10
-----------------------------------------------------------
New tool "qwait" introduced. Wait for queue to finish,
while reporting progress of individual jobs.
Restarting jobclientd no longer deletes all records of 
jobs on the node
New option "reset" for /etc/rc.d/init.d/jobserv
deletes the state completely just in case something goes 
strange.

It is now possible to use the node names as well as node
numbers for the -r option for jr and qr.

Made soleaccess work

1.62-3
Oct 9
-----------------------------------------------------------
"killall -USR1 jobd" now makes jobd reread the config
and jobd automatically instructs jobclientd's to read
the config too.

1.62-1
Oct 9
-----------------------------------------------------------
jobd now saves its state when exiting, and restores it
when starting. This means the job queue now survives 
restarting the server. 
Minor bug fixes

1.61-1
Oct 7
-----------------------------------------------------------
The rpms now work with "setup" and "chkconfig" programs.

1.60-1
Oct 06
-----------------------------------------------------------
It is now possible to specify in qr that the queue stops
or switches to "night" mode when the turn comes to the job.
qset can now change the mail options of a running queue. 
New progs qstop, qstart and qnight (convenient shortcuts 
to qset)

1.59-2
Oct 05
-----------------------------------------------------------
jobclients now report the amount of memory, number of cpus
and bogomips to jobd upon request. This means three fields
in /etc/jobd.conf are obsolete: MAX_SWAP, MAX_MEM 
and MAX_PROCS. 
 
Corrected a bug where non-zero bits in "network" lines
outside netmask caused jobd not to accept any connections
at all (thanks to Silvio Schneider).


1.58-7
Oct 03
-----------------------------------------------------------
Removed bug that caused newly detected jobs to display the 
wrong command in jstat (only for about 30 seconds!)
Changed default server tmpdir to /var/spool/jobd

1.58-6
Sep 21
-----------------------------------------------------------
Removed bug that caused purgescratch to generate error when
directories contain spaces in them.

1.58-5
Jul 31
-----------------------------------------------------------
Removed "-s /bin/bash" from jobclientd when spawning jobs
from the queue. This caused tcsh users not to get their 
environment string read.
Fixed rc-link bug in jobserv-1.58-4.i386.rpm which caused
server not to start upon boot.


1.58-3
Jul 27
-----------------------------------------------------------
Fixed bug which caused the disabling of automatic script
and script output deletion not to work (-Z option in qr).


1.58
May 23
-----------------------------------------------------------
Now it is possible to have a mail sent when the last job
in the queue finishes running (used to be mail for each job).
See the qr man page.  
jobclient now spawns using children using bash all the time
(TMPDIR was not set correctly before).
NB: The rpm-file jobclient-1.58-1.i386.rpm had a bug making
    an incorrect symbolic link in /etc/rc.d/rc5.d and rc3.d.
    This is corrected in jobclient-1.58-2.i386.rpm

1.57
May 12
-----------------------------------------------------------
Fixed bug introduced in 1.56 causing jobd to crash when 
queue stopped.

1.56
May 11
-----------------------------------------------------------
Changed the default install directory to /scratch
Queues are now deleted only when the last job finishes, not
when the last job is submitted for execution. 
New keyword QUEUE_LOAD_GOODNESS allows queued jobs to start even 
if it uses a "little" too much cpu.
rpm packages available now! 

1.55
May 4
-----------------------------------------------------------
Fixed bug that caused workdir to be displayed wrong in 
jstat. Also corrected error in jstat output when feeding
result through a pipe. 

jobclientd now keeps trying forever to connect (every minute 
or so) if the server does not respond. Earlier versions 
exited after 50 attempts.

1.54
May 2
-----------------------------------------------------------
Internal "leak" fixed that caused the list of jobs
to become very large.

1.53
Apr 26
-----------------------------------------------------------
Queue priority is now reversed.. High priority means more
jobs get run, i.e. 
number_of_jobs_running/queue_priority=const_across_queues

Added column in jstat telling which queue a job came from.

jkill can now kill all jobs running that were started from 
a named queue.

It is now possible to limit the number of jobs a user has 
running at one time. New fields LIMIT_JOBS limits users 
according to their unix group id.

New field ZAPQ_BY_DEFAULT in config file enables automatic 
deletion of queue output after queued scripts finishes.
This can be overridden with -z and -Z command line options 
with qr (used to not work!).

1.52
March 27
-----------------------------------------------------------
Fixed bug which caused jobclientd not to recognize spawned
jobs.
Ignore processes based on IGNORE field of configuration file.  
Aliased 'LUMP_THESE' with 'LUMP'.
Lots of small changes submitted by Stewart Adcock.
'o' option for jstat now displays current queue checking order.

1.51
-----------------------------------------------------------
Feb 21
Minor changes in default config file. 


1.50
-----------------------------------------------------------
Dec07
Major fixup of code. 
Fixed jobd crash when connections timed out. 
Fixed qcancel problem deleting single jobs.
Default workdir for queued jobs is current dir! 
New option "-W" for qr selects local directory on 
scratch as workdir for queued job.
Default queue options can now be set for an entire 
queue.
Reservation of scratch space has been removed! That means 
the following keywords from /etc/jobd.conf are obsolete
and can safely be removed:
max_scratch, MAXSCRATCHREQ, MINSCRATCHREQ,
MINSCRATCHHEADROOM, DEFAULT_SCRATCH

1.33
-----------------------------------------------------------
New LUMP_THESE field in config file tidies up window 
manager processes. 
installer script now works with "-y" option.


1.32
-----------------------------------------------------------
Fixed bug introduced in 1.31 in jobclientd 
Caused the queue system not to work.

1.31a
-----------------------------------------------------------
Fixed bugs in installer script.
Status works for redhat6.1 init script.

1.31
-----------------------------------------------------------
Changed jobclientd MINUID to 101 because of Redhat 6 xfs 
user.
New feature: installer script.
Programs *should* run on SuSe now...

1.30
------------------------------------------------------------
Oct 21
Included installation program and SysV init scripts that 
work on SuSe (thanks to Marcus Wolschon). Also corrected
"nice" bug in jobclientd.
New qset program can change priorities in a queue and 
start/stop user queues.
Changes the directory that programs sit in:
/usr/local/bin for user programs (jkill jnice jstat 
qcancel qr qset)
/usr/local/sbin for root system binaries (jobd,  procparent
 and jobclientd)

1.29a 
------------------------------------------------------------
Oct 8
jstat now has option to sort according to all kinds of 
things.

1.29
------------------------------------------------------------
Oct 6
No longer need procps. It was simply too bad at measuring
cpu usage... Now we read /proc directly, but only if
Time::HiRes is installed! 
Also, now jstat reads terminal width if Term::Size is 
installed.

1.28a
------------------------------------------------------------
Corrected small bugs in jobclientd that caused errors in 
pid group detection. 


1.28
------------------------------------------------------------
Sep 29, 1999
Major bug fix for the LARGE number of bugs introduced with
implementing the reaper function in v1.26... 

It turns out that this was a *very* bad idea since the
reaper signal handler disturbs just about
everything. Instead we now use the magic double fork trick
to avoid zombies (see man perlfunc under fork). Also
implemented a multisocket server, i.e. the server now
listens to multiple connections at the same time greatly
improving performance (see IO::Select man page) Also put the
rpms for procps-2.0.2-4 available for RH5x users and others.

1.27
------------------------------------------------------------
Sep 21, 1999
Critical bug introduced in 1.26 in jobclientd.
Meant that jobclientd didn't fork properly when running queue
jobs. This version (and v1.27a) had severe stability problems.

1.26
------------------------------------------------------------
Sep 21, 1999
Support for procps 2.0 (needed for redhat 6.0)!
Cleaned up code, fixed a few bugs that could make jobd lock up.
(probably introduced some new ones...).
Forking changed so jobd and jobclientd pids are constant.

reaper() function now catches SIGCHILD. This caused a lot of
trouble because the SIGCHLD made subprocesses core dump! The
solution was to block signals for every subprocess using
POSIX module interface to sigprocmask.

Set MINLOADHEADROOM=0 and MINLOADREQ=0 in default jobd.conf
If MINLOADREQ is too large, queue jobs may not start (too
much cpu is reserved for the "passive" jobs.)

1.25a
____________________________________________________________
Corrected bug in jobclientd so that extensions in command
strings are listed.

1.25
------------------------------------------------------------
Queue scheduling changed: now queue jobs are scheduled so
that the number of currently running jobs is balanced across
queues. This means that if user A submits a zillion jobs
(and no other queues are active) his jobs will occupy all
nodes. But if another user B submits jobs, his jobs will be
serviced first (as the running user A jobs finish one after
one) until A and B are "balanced". The balancing is
according to priority, so that

  n_running * queue_priority = constant

1.24      Sep 9,99
------------------------------------------------------------
First release in freshmeat.net and www.beowulf-underground.org. 
Redhat 5.2 only (until version 1.26!)




BUGS:
------------------------------------------------------------
You cannot have spaces in directory names as your workdir in
"jr" or "qr". To be fixed in a future version.

Security issues: 
------------------------------------------------------------
The job manager was designed with security in mind (uses TCP 
connections with a network/netmask for allowed hosts), but
there is still some work to be done.