Quantcast
Channel: SE Stuff and the like... » second
Viewing all articles
Browse latest Browse all 2

check_iostat / check_io for nagios Version 0.9.5

$
0
0

Oke here is the first working version of check_iostat.pl

NEW Version available HERE

http://sysengineers.wordpress.com/2010/05/27/check_iostat-pl-version-0-9-7

Why this version?
Well to my big surprise I was not able to find any satisfying check plugin for diskio that didnt need all kinds of additional software or undocumented perl plugins. What i really wanted was a simple check plugin that would only require the default sysstat daemon allready present on allot of our systems (Oracle DB requirement).

Because no one seemed to have written such a plug-in I decided to write one myself. And here is the result.

How to get started?
Copy paste the source into a check_iostat.pl file within nagios/libexec.

The following needs to be on your Linux machine.
sysstat must be installed!
iostat must be available!
perl (base) should be installed!

What can you do?
1. Let us know on what distro you have tested this code! And or what changes you made to make it work.
2. Tell me if you spotted any improvements in the code.
3. Tell me if there are any bugs.

What more?
Questions? Different requirements? Improvements? Code cleanup?
Please let me know…

Tested with?
PERL
v5.8.8
v5.8.3

LINUX
Enterprise Linux Enterprise Linux Server release 5.3 (Carthage)
Enterprise Linux Enterprise Linux Server release 5.2 (Carthage)
SUSE LINUX Enterprise Server 9 (i586)

Note!
the code is still a bit messy but functional. Keep an eye out here for updates. I will be cleaning up this code further.

Rgrds,
Chris.

Code below…

What does it look like in Centreon?
IO Stat Nagios / Centreon

#!/usr/bin/perl
# Written by Chris Gralike @ AMIS.
# Perl based check command to fetch and report the
# TPS (transactions per second) and IO wait times.
# Plugin uses iostat for opperation.
# Verion 0.9.5
#
#
#
#
#
#use warnings;
use Math::Complex; # Used to enable the sqrt function #^(1/2) doesnt work :(.
use Switch;

my($numArgs,
   $debug,      # Print debugging information.
   $DevType,    # Used to match a certain devicetype from the resulting IOstat rows.
   $IOBIN,      # Is used to store a path to the iostat binairy for execution.
   $Samples,    # Used to store the initial Samples returned by iostat.
   @SampleRows, # Used to store the rows generated by the splitted samples.
   $firstseen,  # Used to keep track of the found devices (IOstat might return a set of devices i.e sda, sdb, sdc etc.)
   $Items,      # Used in the foreach to store the row being parsed.
   @cols,       # Used to store the columns in a row after an split.
   $dev,        # Used to create a symbolic link to dynamicly create a var.
   $rqm,
   $val,
   $devtypes,
   $rws, $rws_warn, $rws_crit,
   $kbs, $kbs_warn, $kbs_crit,
   $awt, $awt_warn, $awt_crit,
   $kbs, $kbs_warn, $kbs_crit,
   $svc, $svc_warn, $svc_warn,
   $dbu, $dbu_warn, $dbu_crit
   );

# Preparing to collect the dangerious user input.
# Here is a list of known device types. Please add any device you would like to monitor..
$devtypes=";sd;hd";
$numArgs = $#ARGV + 1;

if($numArgs gt '0'){
        for($i=0;$i<$numArgs;$i++){
        # Process our command line arguments and do some basic testing.
        # Could be make human save in the future.
        switch ($ARGV[$i]) {
                # Process our disktype.
                case '-d'    {
                                $val=$ARGV[$i+1];
                                if( (index($devtypes, $val)) gt '-1'){
                                        $DevType=$val;
                                        $i++;
                                }else{
                                        print "Ivalid Disktype found. Typo?\n"; exit 1;
                                }
                             }
                # Do we need to check rqm?
                case '-rqm'  { $rqm='1'; }
                # What is the warning treshold?
                case '-rqmw' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rqm_warn=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rqmw, typo? \n"; exit 1;
                                }
                             }
                case '-rqmc' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rqm_crit=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rqmc, typo? \n"; exit 1;
                                }
                             }
                # Do we need to check rws?
                case '-rws'  { $rws='1'; }
                case '-rwsw' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rws_warn=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rwsw, typo? \n"; exit 1;
                                }
                             }
                case '-rwsc' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rws_crit=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rwsc, typo? \n"; exit 1;
                                }
                             }
                # Do we need to check kbs?
                case '-kbs'  { $kbs='1'; }
                case '-kbsw' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $kbs_warn=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in kbsw, typo? \n"; exit 1;
                                }
                             }
                case '-kbsc' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $kbs_crit=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in kbsc, typo? \n"; exit 1;
                                }
                             }
                # Do we need to check awt?
                case '-awt'  { $awt='1'; }
                case '-awtw' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $awt_warn=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in awtw, typo? \n"; exit 1;
                                }
                             }
                case '-awtc' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $awt_crit=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in awtc, typo? \n"; exit 1;
                                }
                             }
                # Do we need to check svc?
                case '-svc'  { $svc='1'; }
                case '-svcw' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $svc_warn=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in svcw, typo? \n"; exit 1;
                                }
                             }
                case '-svcc' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $svc_crit=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in svcc, typo? \n"; exit 1;
                                }
                             }
                # Do we need to check dbu?
                case '-dbu'  { $dbu='1'; }
                case '-dbuw' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $dbu_warn=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in dbuw, typo? \n"; exit 1;
                                }
                             }
                case '-dbuc' {
                                $val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $dbu_crit=$val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in dbuc, typo? \n"; exit 1;
                                }
                             }
                # performance data. Might make the string human unreadable.. ow well, no loss there :)
                case '-p'    { $prf='1'; }
                # Print full messages per device overview.
                case '-m'    { $fms='1'; }
                # Most used help switches.
                case '--help'{ USAGE(); }
                case '-h'    { USAGE(); }
        }
        }
        # Check if the basic requirements are met.
        if(!($DevType) ||  !(($rqm || $rws || $kbs || $awt || $svc || $dbu))){
                print "Minimal requiremens, device and checktype are not met\n";
                USAGE();
        }
}else{
        # No input was given. Show the Usage();
        USAGE();
}


# Locate the IOBin binairy needed to fetch the stats.
chomp($IOBIN=`which iostat`);

if( !(-f $IOBIN) || !(-x $IOBIN)){
        print "A working iostat command is needed for this script to work \n";
        print "Also make sure the sysstat service is running! /etc/init.d/sysstat \n";
        exit 1;
}

if($DevType){
        # IF IOStat is found, lets collect some data.
        chomp($Samples=`$IOBIN -d -x -k 1 5 | grep $DevType`);
}else{
        print "Please select a valid devicetype \n";
        exit 1;
}

# Break the samples up in lines so we can evaluate them.
# The first set of samples are avg. values counted from boot.
# We need to discard them and collect the remaining samples.
@SampleRows=split(/\n/, $Samples);

# Firstseen is used to track de device names and to skip the first itteration of iostat that contains
# avg stats counted from system boot time, stats we cant use here sadly :)
$firstseen='';
$CT='0';
foreach $Items (@SampleRows){
         # Break the latter up in usable columns.
        @cols=split(/\s+/,$Items);
         # We only want a certain device type. So lets match what we know.
         # Disks usualy have a prefix for example (scsi = sd) its set using
         # the DevType var.
        if($cols[0]=~m/$DevType[a-z]$/){
                $dev=$cols[0];
                if((rindex $firstseen, $dev) gt '-1'){
                        # Store the collected data in the correct (dynamic) vars.
                        $$dev[1]+=int($cols[1]);        # rrqm/s   Read Requests Merged per Second.
                        $$dev[2]+=int($cols[2]);        # wrqm/s   Write Requests Merged per Second.
                        $$dev[3]+=int($cols[3]);        # r/s      Number of read requests issued per Second
                        $$dev[4]+=int($cols[4]);        # w/s      Number of write requests issued per Second
                        $$dev[5]+=int($cols[5]);        # rKB/s    Number of Kilobytes read per Second.
                        $$dev[6]+=int($cols[6]);        # wKB/s    Number of Kilobytes written per Second.
                        $$dev[7]+=int($cols[7]);        # Avgrq-sz Avarage size (in sectors) of the issued requests.
                        $$dev[8]+=int($cols[8]);        # Avgqu-sz Avarage Queue length of the requests issued.
                        $$dev[9]+=int($cols[9]);        # Await   Avarage wait time in ms for IO requests to be served.
                        $$dev[10]+=int($cols[10]);      # svctm    Avarage service time in ms for IO requests that where issued.
                        $$dev[11]+=int($cols[11]);   # %util    Precentage of CPU time during IO requests (bandwidth util), saturation at 90~100%
                        $CT++; # Add a new device to the count
                        # Print some debugging vars if requested. to show the data is collected.
                        if($debug){ print ">DEV:$dev< > $cols[9]< \t >  $cols[11] <\n"; }
                }else{
                        $firstseen.="$dev;";
                }
        }
}
# Calculate the number of real devices using the square root of the
# the count $CT
$Devs=sqrt($CT);
if($debug){ print "Number of devices : $Devs; \n"; }
# Lets collect the device information from the firstseen var
# and start processing it for some perf check/data
# Lets also recycle some previously used vars for this.
@cols=split(/;/,$firstseen);
foreach $Items (@cols){
        # Items now contains the devicenames needed to access the data again.
        # We now need to check them against some basic tresholds
        # Print a nice table with the calculated values when we are in debug.
        if($debug){print $Items . "\t"; for($i=1; $i < 12; $i++){ $val = $$Items[$i]/$Devs; print $val . "\t"; } print "\n";}
        #What do we want to check against a treshold?
        #First check the selection if any.
        if($rqm || $rws || $kbs || $awt || $svc || $dbu){

               # In all other cases things are oke! Yey!
               # Reset the state counters.
               $critical_state='0';
               $warning_state='0';
               $ok_state='0';

                # Requests Merged per second.
                if($rqm){
                        # Critical
                        if(($$Items[1] >= $rqm_crit) || ($$Items[2] >= $rqm_crit)){
                                $critical_state+='1';
                        # Warning?
                        }elsif(($$Items[1] >= $rqm_warn) || ($$Items[2] >= $rqm_warn)){
                                $warning_state+='1';
                        # Ok
                        }else{
                                $ok_state+='1';
                        }
                        # Add the counters to the performance var.
                        $perf.="$Items-rrqm/s=$$Items[1]; $Items-wrqm/s=$$Items[2]; ";
                }
                # Reads / Writes per second.
                if($rws){
                        if(($$Items[3] >= $rws_crit) || ($$Items[4] >= $rws_crit)){
                                $critical_state+='1';
                        # Warning?
                        }elsif(($$Items[1] >= $rws_warn) || ($$Items[2] >= $rws_warn)){
                                $warning_state+='1';
                        # Ok
                        }else{
                                $ok_state+='1';
                        }
                        # Add the counters to the performance var.
                        $perf.="$Items-r/s=$$Items[1]; $Items-w/s=$$Items[2]; ";
                }
                # KB Read/Writes per second.
                if($kbs){
                        if(($$Items[5] >= $kbs_crit) || ($$Items[6] >= $kbs_crit)){
                                $critical_state+='1';
                        # Warning?
                        }elsif(($$Items[5] >= $kbs_warn) || ($$Items[6] >= $kbs_warn)){
                                $warning_state+='1';
                        # Ok
                        }else{
                                $ok_state+='1';
                        }
                        $perf.="$Items-rKB/s=$$Items[5]; $Items-wKB/s=$$Items[6]; ";
                }
                # Avarage wait time
                if($awt){
                        if(($$Items[9] >= $awt_crit)){
                                $critical_state+='1';
                        # Warning?
                        }elsif(($$Items[9] >= $awt_warn)){
                                $warning_state+='1';
                        # Ok
                        }else{
                                $ok_state+='1';
                        }
                        $perf.="$Items-await=$$Items[9]; ";
                }
                # Avarage service time issuing time
                if($svc){
                        if($$Items[10] >= $svc_crit){
                                $critical_state+='1';
                        # Warning?
                        }elsif($$Items[10] >= $svc_warn){
                                $warning_state+='1';
                        # Ok
                        }else{
                                $ok_state+='1';
                        }
                        $perf.="$Items-svctm=$$Items[10]; ";
                }
                # Disk bandwidth Utilization
                if($dbu){
                        if($$Items[11] >= $dbu_crit){
                                $critical_state+='1';
                        # Warning?
                        }elsif($$Items[1] >= $dbu_warn){
                                $warning_state+='1';
                        # Ok
                        }else{
                                $ok_state+='1';
                        }
                        $perf.="$Items-util=$$Items[1]%; ";
                }
        }else{
                print "At least select a value to measure..\n";
                exit 1;
        }

        # Create a messages var.
        $mgs.="$Items=O:$ok_state,W:$warning_state,C:$critical_state; ";

        # Track the global state (1 crit 1 warn)
        if(($critical_state gt '0') || ( $critical_global gt '0')){
                $critical_global+='1';
        }
        if(($warning_state gt '0') || ( $warning_global gt '0')){
                $warning_global+='1';
        }
}

# Compose a nice nagios output.
if($critical_global >= '1'){
        print "CRITICAL:";
        $exit=2;
}elsif($warning_global >= '1'){
        print "WARNING:";
        $exit=1;
}else{
        print "OK:";
        $exit=0;
}
# Print the remainder, the most important data was processed.
print $mgs; if($prf){ print "|$perf"; } print "\n";
exit $exit;

###Subroutines
sub USAGE{
        print "
                Usage : $0 -d [Dev] [options]

                -p Print performance data about the measured samples.

                -d {grep string used on IOstat}
                examples;
                 sd     #All scsi devices.
                 hd     #All Cdrom devices.
                 sda    #Only device sda

                [Available Measurement Options]
                -rqm -rqmw val -rqmc val        # read/write merged             [#]
                -rws -rwsw val -rwsc val        # read/write per second.        [s]
                -kbs -kbsw val -kbsc val        # KBs read/written per second.  [s]
                -awt -awtw val -awtc val        # Avarage IO wait time.         [ms]
                -svc -svcw val -svcc val        # Avarage service IO wait time. [s]
                -dbu -dbuw val -dbuc val        # Disk utilization              [%]\n";
        exit 1;
}

Tagged: avarage, avgqu-sz, avgrq-sz, await, centreon, check, Check_command, check_io, check_iostat, Command, disk, for, io, iostat, Linux, nagios, Perl, plugin, r/s, rKB/s, rrqms/s, second, service, svctm, sysstat, Time, util, Utilization, utilzation, w/s, wait, wKB/s, writes, wrqm/s

Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images