|
|
.TH wdd "3" "September 2008" "wdd (flxutils) 0.1.32" "Simple Watchdog Daemon"
.SH NAME
wdd - Simple Watchdog Daemon
.SH SYNOPSIS
.B wdd [-c count -f file] [file]...
.SH DESCRIPTION
Wdd is a simple daemon which periodically pings the watchdog attached
to /dev/watchdog to keep it alive, and parallely performs a series of
system health checks to ensure everything is working correctly. If it
detects an error, it exits so that the watchdog driver doesn't receive
its keep-alives anymore and the system will quickly reboot. It is
particularly targetted at remotely managed systems where accessibility
is a prior concern.
As a bonus, it's really tiny, it consumes between 12 and 20 kB of
memory on x86.
It can optionnally take a list of files in arguments. These files will
be checked upon startup, and all those which are accessible will be
periodically checked (one file per second) and the daemon will exit
as soon as it cannot access any of them. This is particularly useful
on \fB/dev\fP, \fB/var\fP, \fB/tmp\fP, \fB/proc\fP and more generally
any remotely mounted file-system (including \fB/\fP in case of NFS
root). As a special case, all files which are not accessible at
startup are removed from the tests and get their first character
rewritten with a '\fB!\fP' in the argument list, so that they become
easily identifiable with \fBps\fP.
Another option consists in checking an "alive file". It is a file
which is checked every second, and which causes the daemon to exit
if it is not touched within a number of seconds. The file name must be
passed after the optionnal argument "-f", and the maximal count of
acceptable checks which report the same status must be passed after
the optionnal argument "-c". If either of these arguments is omitted,
the file check is disabled, which is the default. A non-existing file
is considered different from an existing one.
.SH SYSTEM CHECKS
System checks are performed in this order :
.LP
.TP
\- Opening of \fB/dev/watchdog\fP if it was not previously open ;
.TP
\- Opening of \fB/dev/misc/watchdog\fP if \fB/dev/watchdog\fP failed ;
.TP
\- Allocation then release of 4 kB of memory to check that the VM
subsystem is still operating, and to give the daemon a chance to die
under Out-of-Memory conditions (OOM) ;
.TP
\- Fork a child and wait for its immediate death. This ensures that the
system still has free PIDs and some memory, can schedule, and can
deliver signals.
.TP
\- File access on the next file in the argument list. Once the last one
has been checked, the check loops back to the first one. If no
argument was given, a test is performed on \fB/\fP to ensure that the
VFS still works and that an eventual NFS root is still accessible.
.TP
\- The alive file is checked for changes.
.TP
\- The daemon then pauses one second before starting the checks again.
.SH FILES
.TP
\fB/sbin/wdd\fP
.br
The daemon itself
.SH EXAMPLES
.LP
.TP
\fBnice -n 10 /sbin/wdd\fP
.br
Starts the daemon with a +10 renice value, and checks the
accessibility of the root directory every second. Launching it with
\fBnice\fP is recommended since it makes it more sensible to fork
bombs because it soon will not get enough time slices to ping the
driver.
.TP
\fBwdd / /dev/watchdog /proc/self/root /var/run /tmp/.
.br
Periodically checks all of these existing entries. This ensures that
\fB/\fP is always reachable, that \fB/dev\fP has not been wiped out
nor experienced a mount or unmount, that \fB/proc\fP is mounted, that
\fB/var\fP is mounted and has not been wiped out, and that \fB/tmp\fP
either is mounted or is a symbolic link to a valid directory.
.TP
\fBwdd -c 3600 -f /var/state/.wdalive / /proc/self/root
.br
If the status of the file \fP/var/state/.wdalive\fP has not changed
during the last 3600 seconds, the daemon will exit.
.SH BUGS
.LP
.TP
The daemon cannot renice itself, for this you need the 'nice' command.
.TP
There's absolutely no output, neither debug nor error.
.SH SEE ALSO
\fIDocumentation/watchdog-api.txt\fP in the Linux kernel sources.
.SH AUTHORS
Willy Tarreau <w@1wt.eu>
|