wiki:Dpm/Admin/Monitoring

DPM (Disk Pool Manager)

The Disk Pool Manager (DPM) is a lightweight solution for disk storage management. It is part of LCGDM ( https://svnweb.cern.ch/trac/lcgdm )


Monitoring

  1. Installation
  2. Configuration (monitored machine)
  3. Configuration (nagios host)
    1. Plotting the data
    2. How to develop a probe
    3. Frequently Asked Questions (FAQ)
      1. How to enable / disable a probe
      2. How to change the check frequency of a probe

DPM nagios packages include probes to monitor each of our node types.

Currently available DPM/LFC Nagios probes.

Installation

  1. Configure our unstable yum repository
  1. Install the nagios plugins rpm corresponding to the node type
    # yum install nagios-nrpe nagios-plugins-[lcgdm|dpm-head|dpm-disk]
    

nagios-plugins-lcgdm should be installed in the nagios host itself.

Configuration (monitored machine)

  1. Enable the nrpe.d config directory in the nagios configuration
    # vim /etc/nagios/nrpe.cfg
    ...
    include_dir=/etc/nrpe.d/
    ...
    
  2. Enable nrpe via xinetd
    # cp /opt/lcg/share/doc/nagios-plugins-lcgdm/examples/nrpe /etc/xinetd.d
    

/usr/share/doc/nagios-plugins-lcgdm/examples/nrpe instead for an EMI installation.

  1. Restart the xinetd service
    # service xinetd restart
    

Configuration (nagios host)

The nagios host is the machine running the nagios daemon.

  1. declare following file in the basic nagios configuration file (definition files for nagios-plugins-lcgdm probes)
    # vim /etc/nagios/nagios.cfg
    ...
    cfg_file=/etc/nagios/generic-service.cfg
    cfg_file=/etc/nagios/lcgdm-services.cfg
    cfg_file=/etc/nagios/lcgdm-hosts.cfg
    cfg_file=/etc/nagios/lcgdm-commands.cfg
    ...
    
  2. For each of the machines to be monitored, add to /etc/nagios.d/lcgdm-hosts.cfg an entry like:
    # vim /etc/nagios/lcgdm-hosts.cfg
    ...
    define host {
            use             generic-host
            host_name       <hostname>
            name            <machine description>
            hostgroups      <node type(s)>
    }
    ...
    

With the nagios-plugins-lcgdm rpm you get default configurations for the 3 node types: dpm-disks, dpm-heads, nagios-host. You can fill hostgroups with a comma separated list of any of these types (as appropriate). Each of these type has to have at least one host in it.

  1. (For nagios 2 only) Some probes are installed on the nagios host. This probes must have the server hostname to work correctly. To modify in, edit the file /etc/nagios/lcgdm-hosts.cfg and modify the '-H' option.
    ...
    define command{
            command_name    check_dpns
            command_line    /usr/lib64/nagios/plugins/lcgdm/check_dpns -H testdpm-h
    }
    ...
    

  1. Reload the nagios daemon
    service nagios reload
    

Plotting the data

Detailed information under How to set pnp4nagios.

How to develop a probe

Detailed information under Probe development.

Frequently Asked Questions (FAQ)

How to enable / disable a probe

If you want to disable a probe in every client, then the easiest way is to comment the nagios service definition.

vim /etc/nagios/lcgdm-services.cfg

#define service {
#        use                     lcgdm-generic-service
#        hostgroup_name          dpm-heads, dpm-disks
#        service_description     DM_CERT
#        check_command           check_nrpe!check_hostcert
#}

If you want to disable a probe only a group (headnode or disknode), then you have to modify the hostgroup_name option:

vim /etc/nagios/lcgdm-services.cfg

define service {
        use                     lcgdm-generic-service
        hostgroup_name          dpm-heads (, dpm-disks, other-group)
        service_description     DM_CERT
        check_command           check_nrpe!check_hostcert
}

How to change the check frequency of a probe

The check frenquency is defined by the nagios option "normal_check_interval". This option can be applied either at a template level or a service level as following:

define service {
        use                     lcgdm-generic-service
        hostgroup_name          dpm-heads
        service_description     DM_SPACE_TOKEN
        check_command           check_nrpe!check_space_token
        normal_check_interval   60
}

The value specified in the service definition overide all the value in template.

Last modified 4 years ago Last modified on Nov 25, 2011 3:19:37 PM