Learning NAGIOS 3.0
上QQ阅读APP看书,第一时间看更新

Monitoring Database Systems

Databases allow the storage of information that is used often by entire departments or whole companies. Because most systems usually depend on one or more databases, a failure in these databases can cause all of the underlying systems to go down as well. Imagine a business-critical database failure that went unnoticed over a weekend, making both the company's web site, as well as email, unavailable. That would be a disaster! A series of scheduled reports that was supposed to be sent out, would fail to be generated because of this.

This is why, making sure databases are working correctly and have enough resources to operate, might be essential for many companies. Many enterprise-class databases also have table space capacity management which should also be monitored—even though a valid user may be able to log in, this does not necessarily mean that a database is up and running correctly.

MySQL

One of the most commonly-used database types is MySQL. MySQL is, very often, used to provide a basic database for PHP-based web applications. It is also commonly used as a database systems for client-server applications. Nagios offers two plugins to verify if MySQL is working properly. One of the plugins allows checking of connectivity to the database and checking master-slave replication status. The other one allows the measurement of the time taken to execute an SQL query. The syntax of both the commands, and the definition of their options is as follows:

check_mysql [-H host] [-d database] [-P port]
            [-u user] [-p password] [-S]
check_mysql_query -q SQL_query [-w <warn>] [-c <crit>] [-d database]
                  [-H host] [-P port] [-u user] [-p password]

For the check_mysql command, the -w and -c options allow us to verify if the slave server is in sync with the master server. If the slave is more than the specified number of seconds behind the master server, a warning or critical status (as appropriate) is returned.

If the -S option is specified, the plugin will also check whether the replication of MySQL databases is working correctly. This check should be run on MySQL slave servers to make sure that the replication with the master server is in place. Monitoring the number of seconds by which the slave server is behind the master server can be done using the -w and –c flags. In this case, if the slave server is more than the specified number of seconds behind the master server in the replication process, a warning or critical status is issued. More information about checking the replication status can be found under the MySQL documentation for the SHOW SLAVE STATUS command (visit http://dev.mysql.com/doc/refman/5.0/en/show-slave-status.html).

For the check_mysql_query command, the -w and -c options specify the limits for the execution time of the specified SQL query. This allows us to make sure that database performance is within acceptable limits.

The definitions of the check commands for both a simple test and running an SQL query within a specified time are as follows.

  define command
  {
    command_name  check_mysql
    command_line  $USER1$/check_mysql –H $HOSTADDRESS$ -u $ARG1$
                  -p $ARG2$ -d $ARG3$ -S –w 10 –c 30
  }
  define command
  {
    command_name  check_mysql_query
    command_line  $USER1$/check_mysql_query –H $HOSTADDRESS$ -u $ARG1$
                  -p $ARG2$ -d $ARG3$ -q $ARG4$ –w $ARG5$ -c $ARG6$
  }

Both the examples need the username, password, and dbname as arguments. The second example also requires an SQL query, and warning, and critical time limits.

PostgreSQL

PostgreSQL is another open source database which is commonly used in hosting companies. It is also used very often for client-server applications. The Nagios plugins package offers a command to check if the PostgreSQL database is working correctly. Its syntax is quite similar to the MySQL command:

check_pgsql [-H <host>] [-P <port>] [-w <warn>] [-c <crit>]
            [-t <timeout>] [-d <database>] [-l <logname>]
            [-p <password>]

The following table describes the options that this plugin accepts:

A sample check command that expects user name, password, and database name as arguments is as follows:

  define command
  {
    command_name  check_pgsql
    command_line  $USER1$/check_pgsql –H $HOSTADDRESS$ -l $ARG1$
                  -p $ARG2$ -d $ARG3$
  }

Oracle

Oracle is a popular enterprise-class database server. It is mainly used by medium- and large-sized companies for business critical applications. Therefore, a failure, or even a lack of disk space, for a single database might cause huge problems for a company. Fortunately, a plugin exists to verify various aspects of the Oracle database. And it even offers the ability to monitor tablespace storage and cache usage. The syntax is quite different from most Nagios plugins as the first argument specifies the mode in which the check should be carried out, and the remaining parameters are dependant on the first one. The syntax is as follows:

check_oracle --tns <ORACLE_SID>
             --db <ORACLE_SID>
             --oranames <Hostname>
             --login <ORACLE_SID>
             --cache <ORACLE_SID> <USER> <PASS> <CRITICAL> <WARNING>
             --tablespace <ORACLE_SID> <USER> <PASS>
                          <TABLESPACE> <CRITICAL> <WARNING>

For all checks, Oracle SID (System Identifier) can be specified in the form of <ip> or <ip>/<database>. Because the plugin automatically adds the username and password to the identifier, an SID in the form of <username>[/<password>]@<ip>[/<database>] should not be specified, and in many cases, will not work.

The --tns option checks if a database is listening for a connection based on the tnsping command. This can be used as a basic check of both local and remote databases.

Verifying that a local database is running can be done using the --db option—in which case, a check is performed by running the Oracle process for a specified database.

Verifying a remote Oracle Names server can be done using the --oranames mode.

In order to verify if a database is working properly, a --login option can be used—this tries to log in using an invalid username and verifies whether the ORA-01017 error is received, in which case, the database is behaving correctly.

Verifying cache usage can be done using the --cache option, in which case, the cache hit ratio is checked—if it is lower than the specified warning or critical limits, the respective status is returned. This allows the monitoring of bottlenecks within the database caching mechanism.

Similarly, for tablespace checking, a --tablespace option is provided—a check is carried out against the available storage for the specified tablespace. If it is lower than the specified limits, a warning or critical status is returned (as appropriate).

This plugin requires various Oracle commands to be in the binary path (the PATH environment variable). Therefore, it is necessary to have either the entire Oracle installation or the Oracle client installation done on the machine that will perform the checks for the Oracle database. Sample definitions to check the login into the Oracle database and the database cache are as follows:

  define command
  {
    command_name  check_oracle_login
    command_line  $USER1$/check_oracle --login $HOSTADDRESS$
  }
  define command
  {
    command_name  check_oracle_tablespace
    command_line  $USER1$/check_oracle --cache $HOSTADDRESS$/$ARG1$
                  $ARG2$ $ARG3$ $ARG4$ $ARG5$
  }

The second example requires the passing of the database name, username, password, and critical/warning limits for the cache hit ratio. The critical value should be lower than the warning value.

Other Databases

Even though Nagios supports verification of some common databases, there are a lot of commonly-used databases for which the standard nagios-plugins package does not provide a plugin. For these databases, the first thing worth checking is the Nagios Exchange (visit http://www.nagiosexchange.org/) as this has a category for database check plugins, with commands for checking various types of databases (such as DB2, Ingres, Firebird, MS SQL, and Sybase).

In some cases, it might be sufficient to use the check_tcp plugin to verify whether a database server is up and running. In other cases, it might be possible to use a dynamic language (such as Python, Perl, or Tcl) to write a small script that connects to your database and performs basic tests. See Chapter 11, Extending Nagios for more information on writing Nagios check plugins.