#!/usr/bin/perl -w
our $VERSION = '2.4 (2013-01-12)';
#
# check_afs_rxdebug -- Nagios AFS server check for waiting connections.
#
# Expects a file server with the -H option and runs rxdebug against that file
# server, looking for any connections that are waiting for a thread.  Exits
# with status 1 if there are more than two connections in that state (a
# warning) and with status 2 if there are more than eight connections in that
# state.  The thresholds can be overridden from the command line.
#
# Written by Quanah Gibson-Mount based on work by Neil Crellin
# Updated by Russ Allbery <rra@stanford.edu>
# Copyright 2003, 2004, 2005, 2010, 2013
#     The Board of Trustees of the Leland Stanford Junior University
#
# This program is free software; you may redistribute it and/or modify it
# under the same terms as Perl itself.

##############################################################################
# Modules and declarations
##############################################################################

require 5.006;

use strict;

use Getopt::Long qw(GetOptions);

##############################################################################
# Site configuration
##############################################################################

# The default count of blocked connections at which to warn or send a critical
# alert.  These can be overridden with the -w and -c command-line options.
our $WARNINGS = 2;
our $CRITICAL = 8;

# The default timeout in seconds (implemented by alarm) for rxdebug.
our $TIMEOUT = 60;

# The full path to rxdebug.  Make sure that this is on local disk so that
# monitoring doesn't have an AFS dependency.
our ($RXDEBUG) = grep { -x $_ }
    qw(/usr/bin/rxdebug /usr/sbin/rxdebug /usr/local/bin/rxdebug
       /usr/local/sbin/rxdebug);
$RXDEBUG ||= 'rxdebug';

##############################################################################
# Implementation
##############################################################################

# Report a syntax error and exit.  We do this via stdout in order to satisfy
# the Nagios plugin output requirements, but also report a more conventional
# error via stderr in case people are calling this outside of Nagios.
sub syntax {
    print "AFS UNKNOWN - ", join ('', @_), "\n";
    warn "$0: ", join ('', @_), "\n";
    exit 3;
}

# Parse command line options.
my ($help, $host, $version);
Getopt::Long::config ('bundling', 'no_ignore_case');
GetOptions ('c|critical=i' => \$CRITICAL,
            'H|hostname=s' => \$host,
            'h|help'       => \$help,
            't|timeout=i'  => \$TIMEOUT,
            'V|version'    => \$version,
            'w|warning=i'  => \$WARNINGS)
    or syntax ("invalid option");
if ($help) {
    print "Feeding myself to perldoc, please wait....\n";
    exec ('perldoc', '-t', $0) or die "Cannot fork: $!\n";
} elsif ($version) {
    my $version = $VERSION;
    print "check_afs_rxdebug $version\n";
    exit 0;
}
syntax ("extra arguments on command line") if @ARGV;
syntax ("host to check not specified") unless (defined $host);
if ($WARNINGS > $CRITICAL) {
    syntax ("warning level $WARNINGS greater than critical level $CRITICAL");
}

# Set up the alarm.
$SIG{ALRM} = sub {
    print "AFS CRITICAL - network timeout after $TIMEOUT seconds\n";
    exit 2;
};
alarm ($TIMEOUT);

# Run rxdebug and parse the output to find the number of calls waiting for a
# thread.
unless (open (RXDEBUG, "$RXDEBUG $host -noconn |")) {
    warn "$0: cannot run rxdebug\n";
    exit 3;
}
my $blocked;
while (<RXDEBUG>) {
    if (/^(\d+) calls waiting for a thread/) {
        $blocked = $1;
        last;
    }
}
close RXDEBUG;
if ($? != 0) {
    print "AFS CRITICAL - cannot contact server\n";
    exit 2;
}
unless (defined $blocked) {
    print "AFS CRITICAL - cannot parse rxdebug output\n";
    exit 2;
}

# Check the connection count against our limits and make sure that it's okay.
if ($blocked >= $CRITICAL) {
    print "AFS CRITICAL - $blocked blocked connections\n";
    exit 2;
} elsif ($blocked >= $WARNINGS) {
    print "AFS WARNING - $blocked blocked connections\n";
    exit 1;
} else {
    print "AFS OK - $blocked blocked connections\n";
    exit 0;
}

##############################################################################
# Documentation
##############################################################################

=for stopwords
AFS Crellin Nagios Quanah afs-monitor -hV rxdebug util

=head1 NAME

check_afs_rxdebug - Check AFS servers for blocked connections in Nagios

=head1 SYNOPSIS

B<check_afs_rxdebug> [B<-hV>] [B<-c> I<threshold>] [B<-w> I<threshold>]
    [B<-t> I<timeout>] B<-H> I<host>

=head1 DESCRIPTION

B<check_afs_rxdebug> is a Nagios plugin for checking AFS file servers to
see if there are client connections waiting for a free thread.  If there
are more than a few of these, AFS performance tends to be very slow; this
is a fairly reliable way to catch overloaded file servers.  By default,
B<check_afs_rxdebug> returns a critical error if there are more than eight
connections waiting for a free thread and a warning if there are more than
two.  These thresholds can be changed with the B<-c> and B<-w> options.

B<check_afs_rxdebug> will always print out a single line of output
including the number of blocked connections, displaying whether this is
critical, a warning, or okay.

=head1 OPTIONS

=over 4

=item B<-c> I<threshold>, B<--critical>=I<threshold>

Change the critical blocked connection count threshold to I<threshold>,
which should be an integer.  The default is 8.

=item B<-H> I<host>, B<--hostname>=I<host>

The AFS file server whose connections B<check_afs_rxdebug> should check.
This option is required.

=item B<-h>, B<--help>

Print out this documentation (which is done simply by feeding the script
to C<perldoc -t>).

=item B<-t> I<timeout>, B<--timeout>=I<timeout>

Change the timeout for the B<rxdebug> command.  The default timeout is 60
seconds.

=item B<-V>, B<--version>

Print out the version of B<check_afs_rxdebug> and quit.

=item B<-w> I<threshold>, B<--warning>=I<threshold>

Change the warning blocked connection threshold to I<threshold>, which
should be an integer.  The default is 2.

=back

=head1 EXIT STATUS

B<check_afs_rxdebug> follows the standard Nagios exit status requirements.
This means that it will exit with status 0 if there are no problems, with
status 1 if there is a warning, and with status 2 if there is a critical
problem.  For other errors, such as invalid syntax, B<check_afs_rxdebug>
will exit with status 3.

=head1 BUGS

The standard B<-v> verbose Nagios plugin option is not supported, although
it's not entirely clear what it would add.

The usage message for invalid options and for the B<-h> option doesn't
conform to Nagios standards.

=head1 CAVEATS

This script does not use the Nagios util library or any of the defaults
that it provides, which makes it somewhat deficient as a Nagios plugin.
This is intentional, though, since this script can be used with other
monitoring systems as well.  It's not clear what a good solution to this
would be.

=head1 SEE ALSO

This script is part of the afs-monitor package, which includes various AFS
monitoring plugins for Nagios.  It is available from the AFS monitoring
tools page at L<http://www.eyrie.org/~eagle/software/afs-monitor/>.

=head1 AUTHORS

The original idea behind this script was from Neil Crellin.  It was
updated by Quanah Gibson-Mount to work with Nagios, and then further
updated by Russ Allbery <rra@stanford.edu> to support more standard
options and to use a more uniform coding style.

=head1 COPYRIGHT AND LICENSE

Copyright 2003, 2004, 2005, 2010, 2013 The Board of Trustees of the Leland
Stanford Junior University

This program is free software; you may redistribute it and/or modify it
under the same terms as Perl itself.

=cut
