Use RRD to monitor system usage and health

Notes

The scripts and methods described here are by no means universal. They are rather meant as a starting point for adaption to your own needs, without the hassle of learning all about RRDtool before you can even start. They have been tested mainly on Debian Etch machines.

Prerequisites

On Debian, you need to install some extra packages if you want the full monitoring capabilities. These are

  • lm-sensors - for CPU and system temparature readings. See http://www.lm-sensors.org/
  • smartmontools - for HDD temparature readings
  • sysstat - for CPU usage (percentile) statistics
  • utils like grep, sed and net-tools, but these should be in the standard installation
  • and rrdtool, of course!

In conclusion: the command

  apt-get install grep sed net-tools rrdtool lm-sensors smartmontools sysstat

should prepare a properly installed system flawlessly.

Installation and Setup

Installation is really easy, just unpack the attached file as user "root" in the "/" directory.

  tar xzvf rrdmontools.tar.gz

Setup is also done quite quickly, just open up /var/local/rrd/rrdcreate.sh with your favourite editor and change the lines

  USAGESTATS="hda eth0 eth1"                                                                                                                                                           
  HEALTHSTATS="cpu1;CPU1;CPU+1 cpu2;CPU2;CPU+2 sys;DDR;System hda;Seagate"                                                                                                             

according to the following description. Then, adjust the line

  COPYTARGET="~www-data/stats/\`uname -n\`"

It defines where the graphs will be copied to. You can also enter a scp URL like

  www-data@192.168.1.20:~/stats/\`uname -n\`

The example shown here assumes that you have the apache web server installed. The given directory has to be created beforehand.

Usage statistics

In USAGESTATS you can provide a list of up to 5 hard disks (hda, sda, md0 or similar) and also up to 5 network interfaces (eth0 or eth1 or similar). The example here has just one IDE hard disk and two network interfaces. It also works with S-ATA or SCSI hard disks which are recognized as /dev/sdX, software raids and even with partitions (hda1 or similar).

Health (temperature) statistics

In HEALTHSTATS you can list up to 7 temperatures. Firstly, temperatures from sensors. Those have to include "Temp" in the name and another, unique name which does not contain whitespaces. A good example for a very usable sensors output is:

w83782d-i2c-0-2d
Adapter: SMBus AMD766 adapter at 80e0
AGP V:     +3.38 V  (min =  +3.14 V, max =  +3.46 V)              
 +5 V:     +4.68 V  (min =  +4.73 V, max =  +5.24 V)       ALARM  
DDR V:     +1.25 V  (min =  +3.81 V, max =  +2.22 V)       ALARM  
3 VSB:     +3.34 V  (min =  +2.85 V, max =  +3.15 V)       ALARM  
Bat V:     +0.59 V  (min =  +2.64 V, max =  +3.95 V)       ALARM  
PSU Fan:  3154 RPM  (min = 1814 RPM, div = 4)                     
VRM2 Temp:   +61°C  (high =   +40°C, hyst =   +60°C)   sensor = transistor   ALARM   
CPU1 Temp: +50.0°C  (high =   +70°C, hyst =   +80°C)   sensor = transistor           
CPU2 Temp: +56.0°C  (high =   +70°C, hyst =   +80°C)   sensor = transistor           
alarms:   
beep_enable:
          Sound alarm enabled

w83627hf-isa-0c00
Adapter: ISA adapter
VCore1:    +1.70 V  (min =  +1.66 V, max =  +1.84 V)              
VCore2:    +1.70 V  (min =  +1.66 V, max =  +1.84 V)              
+3.3 V:    +3.30 V  (min =  +3.14 V, max =  +3.47 V)              
 +12 V:   +12.21 V  (min = +10.83 V, max = +13.21 V)              
 -12 V:   -12.77 V  (min = -13.18 V, max = -10.80 V)              
CPU1 Fan: 2766 RPM  (min = 2008 RPM, div = 16)                     
CPU2 Fan: 2678 RPM  (min = 2008 RPM, div = 16)                     
VRM1 Temp:   +51°C  (high =   +40°C, hyst =   +60°C)   sensor = transistor           
AGP Temp:  +56.5°C  (high =   +40°C, hyst =   +60°C)   sensor = transistor           
DDR Temp:  +41.5°C  (high =   +40°C, hyst =   +60°C)   sensor = transistor           
vid:      +1.750 V  (VRM Version 9.0)
alarms:   
beep_enable:
          Sound alarm disabled

You can see that all the temperature readings contain the word "Temp", so we could monitor the temperatures

  • VRM2
  • CPU1
  • CPU2
  • DDR
  • AGP
  • VRM1

So what you do is, you think of an internal identifier (like "cpu", "sys", ...), take the name of the sensor (like "CPU1", "VRM1", ...) and think of a nice name (like "Processor Temparature", "Ambient Temp", ...). Then you write the internal identfier, a semicolon, the sensor name, another semicolon and the nice name with all whitespaces replaced by "+" signs, and you get something like

  cpu;CPU1;Processor+Temperature

or

  sys;VRM1;Ambient+Temp

Those go into the HEALTHSTATS.

Finally, you can add hard disk temperatures for IDE and S-ATA drives that support it. How to check? Call smartctl -a -s on -d ata /dev/hda (where the last parameter is the path to your HDD, could /dev/sdc or whatever). If you can find some value called "Temperature", call hddtemp /dev/hda and see if it prints just one value. If so, you can use the latter part (hda) in HEALTHSTATS, followed by a semicolon and a pretty name like "HDD front left" to create

  hda;HDD+front+left

Install

After this, quit the editor and run the rrdcreate.sh:

  cd /var/local/rrd
  ./rrdcreate.sh

This will create the .rrd-Files to data storage, the rrdupdate.sh in /usr/sbin and the mkgraphs.sh.

Usage

The rest is simple: To start collecting data, just run /usr/sbin/rrdupdate.sh. I like to add the following line to /etc/rc.local:

nice --10 /usr/sbin/rrdupdate.sh &

This will give the data collection more priority, thus reducing the risk of missing data.

Then add a cron job to periodically update the graphs. using crontab -e, adding a line like

*/10 * * * * /var/local/rrd/mkgraphs.sh

will do.

Then wait. After one or two minutes you can run rrdtool dump health.rrd or rrdtool dump usage.rrd and see if the last values are not "NaN". If so, it means that the data collection works. After ten minutes, the graphs should have been created and you can for example link them into a webpage. A simple example for this is included in the attached file (/var/local/rrd/sample-report.html).

Sample Report

An actual live sample report:

CPU activity

Today

This week

Network activity

Today

This week

HDD activity

Today

This week

Temps

Today

This week

Last modified 7 years ago Last modified on Nov 30, 2011, 3:46:27 PM

Attachments (25)

Download all attachments as: .zip


NOTE: The content of this website is accessible with any browser. The graphical design though relies completely on CSS2 styles. If you see this text, this means that your browser does not support CSS2. Consider upgrading to a standard conformant browser like Mozilla Firefox or Opera but also Apple's Safari or KDE's Konqueror for example. It may also be that you are looking at a mirror page which did not copy the CSS for this page. Or if some pictu res are missing, then the mirror may not have picked up the contents of the inc directory.