Nagios & Windows
From Superk
|
|---|
Nagios is an incredible tool for monitoring and securing your network. I will be writing a series of articles on using Nagios for various functions as I continue to implement it myself. This installment will focus on monitoring Windows computers through a couple of Nagios' check-command plugins. While this is by no means a comprehensive discussion of how to make Nagios work, I hope to offer some insight and inspiration into monitoring your own network using such a powerful (and free) tool.
Nagios, simply put, is a framework for monitoring your network. It does very little by itself and is quite useless without the plugins that you must download seperately. Since this is not a review or a comprehensive user's guide, I will not go into any of the install process or the configuration of Nagios. However, if you want to get and install it, it is available here along with a very comprehensive manual and FAQ system.
Since the network here is primarily a Windows network and, as a non-profit organization, we undergo a security audit annually, I've been given some of the responsibility of securing things a bit. While Nagios isn't really security software per se, it can help in securing your network if you know which aspects can be symptomatic of a security breech. By watching these aspects of the network very closely, it is possible to quickly spot potential attacks and stop them quickly.
Basic Checks
Nagios contains a few tools that make monitoring Windows a reality. The most basic of which is the check_nt command which in addition to providing a few of it's own basic checks can plug into the Windows Performance Monitor to monitor a great deal more (more about this later). The syntax of the check_nt command is as follows:
check_nt -H <host> \
-v <check_var> \
-p <port> \
-w <warning> \
-c <critical> \
-l <parameters> \
-d <SHOWALL> \
-t <timeout>
Here is the breakdown of the various inputs:
- check_nt - the actual command. Running this command with the -h switch will supply similar information as being presented here.
- host - the resolvable hostname or IP address of the Windows computer you wish to monitor.
- variable - one of several built-in variables you wish to monitor:
- CLIENTVERSION - Get the NSClient version
- CPULOAD - Get the current CPU load of the monitored computer
- UPTIME - Get the uptime of the monitored computer
- USEDDISKSPACE - Check the status of a given drive on the monitored computer
- MEMUSE - Check the current status of memory usage on the monitored computer
- SERVICESTATE - Check the current status of a given service running on the monitored computer
- PROCSTATE - Check if a given process is running or not on the monitored computer
- COUNTER - Check a given Performance Monitor function on the monitored computer (more on this later)
- port - Optional Specify the port to connect to on the monitored computer
- warning - Optional Set the limit at which a warning will be sent.
- critical - Optional Set the limit at which a critical warning will be sent.
- params - Dependent on variable This provides additional information to the variable being checked. Sometimes this is required, sometimes it is not.
- SHOWALL - Dependent on variable Some variable checks allow you to use this switch to show all items rather than a specific item during the check (ie, SERVICESTATE)
- timeout - Optional Set a timeout period for the check command to finish
Performance Monitor
Windows NT and better operating systems provided a tool for monitoring many aspects of the Windows operating system called Performance Monitor. Within Performance Monitor there is a wealth of monitoring functions available to track and graph. All of these monitoring functions are accessible through Nagios as well which makes Nagios a fantastic way of keeping track of all your Windows systems. The COUNTER variable in the check_nt command will let us connect to the Windows Performance Monitor and monitor a specific Performance Monitor function. Here is an example:
check_nt -H <host> \
-v COUNTER -l "\\Memory\\% Committed Bytes In Use", \
"Committed Bytes In Use %.f %%" \
-w 80 \
-c 90
The above command will run a check where it connects to the Windows Performance Monitor (<host>) and read from the "\\Memory" Performance Object and the "\\% Committed Bytes In Use" Counter. The item in double-quotes directly after that (and separated with a comma) is a customized description of the results. Note the use of "%.f" in the custom description. Any custom description can be created using output commands that the C printf command uses. For instance, "%.2f" could have been used to represent the result with two decimal places. To use a '%' symbol by itself in the custom description, it is necessary to escape it using another '%' symbol like this: "%%" (which is output as "%"). To help clarify this, here is the syntax of the check_nt command with the COUNTER variable:
check_nt -H <host> \
-v COUNTER \
-l "\\<performance object>\\<counter>", \
"<description>" \
-w <warning> \
-c <critical>
Here is what's new in this command:
- performance object - This is the container within Performance Monitor that holds the actual performance counter to be checked.
- counter - This is the actual counter that will be checked for it's current status. The current status of this counter is reported as the check command's result.
- description - Optional This is the custom description field of the command. It is necessary for this description to IMMEDIATELY follow the performance object/counter field, be contained in double-quotes and separated from the performance object/counter field by a comma.
As you can see, this command is quite extensible using the COUNTER variable. Literally anything that can be monitored by the Windows Performance Monitor can now be monitored by Nagios. The advantage to do it through Nagios is that there is far less performance drain on the host computer since we are not monitoring in real-time and Nagios has the ability to notify an appropriate administrator in the event something falls out of the allowed ranges. (Another aspect of monitoring with Nagios that hopefully will be discussed in a later article is the ability to track trends for a particular host or service.)
Monitoring with SNMP
One final tool included with Nagios that is useful for monitoring Windows hosts is the SNMP check command, 'check_snmp'. Obviously for Nagios to be able to monitor services through SNMP, the SNMP service must be installed, running and configured properly on the Windows host. This is outside the scope of this particular article.
One important note about the check_snmp command is that you MUST have the Net::SNMP Perl module installed BEFORE compiling Nagios in order for the command to work (or even exist). The check_snmp command uses the following syntax (note, this is a simplified description of check_snmp - more detailed information can be found by running the command with the '-h' or '--help' switches and viewing the man page of 'snmpget'):
check_snmp -H <host> \
-C <community> \
-P <snmp_ver> \
-o <OID> \
-w <warning> \
-c <critical> \
-l <label> \
-u <units>
Here are some of the new variables:
- community - Set the community field (this is basically the secret username that allows read and possibly write access to SNMP on the host).
- version - Actually it's 'Protocol Version' (so you can see why it's '-P') - just the version of SNMP the host is running which is generally just '1'.
- OID - OID (Object Identifier) this is the actual item you want to get results for.
- label - This is a custom label for the specified check command (ie, to replace the default label string of 'SNMP').
- units - This will append a custom units label to the results (by default, no unit specification is stated).
NOTE: This is a very small set of the available flags and switches that are possible. Please view the command's help information for more details. Also, note that items from the syntax that have been explained earlier (ie, host, warning & critical) are left out for brevity.
It's highly recommended that you become familiar with the Net-SNMP package which includes snmpget and snmpwalk. You will probably want to utilize the snmpwalk command to view which OIDs are available on the given host for monitoring (Windows 2000 output about 16.5 pages of printed OID's available for monitoring!). Since SNMP is a very large topic to tackle, I won't get into much more detail other than what I've mentioned so far just to show how it may be used. Here is an example command that uses check_snmp to query a service on a Windows host:
check_snmp -H <host> \
-C public \
-P 1 \
-o hrSystemProcesses.0 \
-w 100 \
-c 150 \
-l "Current System Processes" \
-u "Processes"
This particular check command will check the Windows host (<host>) for the current running processes, label the output with something like: "Current System Processes ## Processes" and warn if the result is over 100 or critically warn if it is over 150. Once you get the hang of SNMP, creating more complicated commands becomes much simpler.
While this article has glazed over a significant protion of what can be done with Nagios to monitor Windows computers, it should give a glimpse of how to use some of the included commands to maintain a very good view of the health of your Windows systems. Of course it is also possible to monitor some specific services using the other Nagios commands (ie, SMTP with check_smtp or PING with check_ping). These services are system-independent and certainly not limited to monitoring by the Windows Performance Monitor. However, using the grainularity of the Performance Monitor, it is possible to keep very close tabs on any Windows NT or better machine.
