Monitoring Subsystem 3.6
The Information Manager (IM) is in charge of monitoring the hosts. It comes with various sensors, each one responsible of a different aspects of the computer to be monitored (CPU, memory, hostname…). Also, there are sensors prepared to gather information from different hypervisors.
Depending on the sensors that are going to conform the IM driver there are different requirements, mainly the availability of the hypervisor corresponding to the sensors. Also, as for all the OpenNebula configurations, SSH
access to the hosts without password has to be possible.
The OpenNebula daemon loads its drivers whenever it starts. Inside /etc/one/oned.conf
there are definitions for the drivers. The following lines, will configure OpenNebula to use the Xen probes:
IM_MAD = [ name = "im_xen", executable = "one_im_ssh", arguments = "xen" ]
Equivalently for VMware, you'd uncomment the following in oned.conf
:
IM_MAD = [ name = "im_vmware", executable = "one_im_sh", arguments = "-t 15 -r 0 vmware" ]
And finally for EC2:
IM_MAD = [ name = "im_ec2", executable = "one_im_ec2", arguments = "im_ec2/im_ec2.conf" ]
Please remember that you can add your custom probes for later use by other OpenNebula modules like the scheduler.
In order to test the driver, add a host to OpenNebula using onehost, specifying the defined IM driver:
<xterm> $ onehost create ursa06 –im im_xen –vm vmm_xen –net dummy </xterm>
Now give it time to monitor the host (this time is determined by the value of HOST_MONITORING_INTERVAL in /etc/one/oned.conf
). After one interval, check the output of onehost list, it should look like the following:
<xterm> $ onehost list
ID NAME CLUSTER RVM TCPU FCPU ACPU TMEM FMEM AMEM STAT 0 ursa06 - 0 800 798 800 16G 14G 16G on
</xterm>
Host management information is logged to /var/log/one/oned.log
. Correct monitoring log lines look like this:
Mon Oct 3 15:06:18 2011 [InM][I]: Monitoring host ursa06 (0) Mon Oct 3 15:06:18 2011 [InM][D]: Host 0 successfully monitored.
Both lines have the ID of the host being monitored.
If there are problems monitoring the host you will get an err
state:
<xterm> $ onehost list
ID NAME CLUSTER RVM TCPU FCPU ACPU TMEM FMEM AMEM STAT 0 ursa06 - 0 0 0 100 0K 0K 0K err
</xterm>
The way to get the error message for the host is using onehost show
command, specifying the host id or name:
<xterm> $ onehost show 0 […] MONITORING INFORMATION ERROR=[
MESSAGE="Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes", TIMESTAMP="Mon Oct 3 15:26:57 2011" ]
</xterm>
The log file is also useful as it will give you even more information on the error:
Mon Oct 3 15:26:57 2011 [InM][I]: Monitoring host ursa06 (0) Mon Oct 3 15:26:57 2011 [InM][I]: Command execution fail: scp -r /var/lib/one/remotes/. ursa06:/var/tmp/one Mon Oct 3 15:26:57 2011 [InM][I]: ssh: Could not resolve hostname ursa06: nodename nor servname provided, or not known Mon Oct 3 15:26:57 2011 [InM][I]: lost connection Mon Oct 3 15:26:57 2011 [InM][I]: ExitCode: 1 Mon Oct 3 15:26:57 2011 [InM][E]: Error monitoring host 0 : MONITOR FAILURE 0 Could not update remotes
In this case the node ursa06
could not be found in the DNS or /etc/hosts
.
Host monitoring interval can be changed in oned.conf
:
HOST_MONITORING_INTERVAL = 600
The value is expressed in seconds and the default value is 600, 10 minutes. You can change this value down to the value in MANAGER_TIMER
(by default is 30 seconds). If you want a lower value you need to change also MANAGER_TIMER
.
The driver itself accepts the same options as the Virtual Machine driver, you can get information on the options at the Virtualization Subsystem guide.
This section details the files used by the Information Drivers to monitor the hosts. There are two important driver files:
/usr/lib/one/mads/one_im_ssh
, and /usr/lib/one/mads/one_im_sh
respectively./var/lib/one/remotes/im/<hypervisor>.d
. A probe is a little script or binary that extract information from remotely (SSH) or locally (SH). The probe should return the metric in a simple NAME=VALUE
format. Let's see a simple one to understand how they work:<xterm> $ cat /var/lib/one/remotes/im/kvm.d/name.sh #!/bin/sh
echo HOSTNAME=`uname -n` </xterm>
This uses the uname command to get the hostname of the remote host, and then outputs the information as:
HOSTNAME=host1.mydomain.org
Files contained in /var/lib/one/remotes/im/<virtualizer>.d
are executed in the remote host. You can add more files to this directory to get more information.
Information Driver is also the responsible to copy all this scripts (and Virtual Management driver scripts) to remote nodes. If you want it to refresh probes on the remote nodes you have to execute the following command in the front-end, as oneadmin:
<xterm> $ onehost sync </xterm>
This way in the next monitoring cycle the probes and VMM Driver action scripts will be copied again to the node.
Configuration on where to copy these files on the remote nodes is done in /etc/one/oned.conf
, the parameter is called SCRIPTS_REMOTE_DIR
, by default it is set to /var/tmp/one
.