Version 5 (modified by human, 10 years ago) (diff) |
---|
Use RRD to monitor system usage and health
Notes
The scripts and methods described here are by no means universal. They are rather meant as a starting point for adaption to your own needs, without the hassle of learning all about RRDtool before you can even start. They have been tested mainly on Debian Etch machines.
Prerequisites
On Debian, you need to install some extra packages if you want the full monitoring capabilities. These are
- lm-sensors - for CPU and system temparature readings. See http://www.lm-sensors.org/
- smartmontools - for HDD temparature readings
- sysstat - for CPU usage (percentile) statistics
- utils like grep, sed and net-tools, but these should be in the standard installation
- and rrdtool, of course!
In conclusion: the command
apt-get install grep sed net-tools rrdtool lm-sensors smartmontools sysstat
should prepare a properly installed system flawlessly.
Installation and Setup
Installation is really easy, just unpack the attached file as user "root" in the "/" directory.
tar xzvf rrdmontools.tar.gz
Setup is also done quite quickly, just open up /var/local/rrd/rrdcreate.sh with your favourite editor and change the lines
USAGESTATS="hda eth0 eth1" HEALTHSTATS="cpu1;CPU1;CPU+1 cpu2;CPU2;CPU+2 sys;DDR;System hda;Seagate"
according to the following description. Then, adjust the line
COPYTARGET="~www-data/stats/\`uname -n\`"
It defines where the graphs will be copied to. You can also enter a scp URL like
www-data@192.168.1.20:~/stats/\`uname -n\`
The example shown here assumes that you have the apache web server installed. The given directory has to be created beforehand.
Usage statistics
In USAGESTATS you can provide a list of up to 5 hard disks (hda, sda, md0 or similar) and also up to 5 network interfaces (eth0 or eth1 or similar). The example here has just one IDE hard disk and two network interfaces. It also works with S-ATA or SCSI hard disks which are recognized as /dev/sdX, software raids and even with partitions (hda1 or similar).
Health (temperature) statistics
In HEALTHSTATS you can list up to 7 temperatures. Firstly, temperatures from sensors. Those have to include "Temp" in the name and another, unique name which does not contain whitespaces. A good example for a very usable sensors output is:
w83782d-i2c-0-2d Adapter: SMBus AMD766 adapter at 80e0 AGP V: +3.38 V (min = +3.14 V, max = +3.46 V) +5 V: +4.68 V (min = +4.73 V, max = +5.24 V) ALARM DDR V: +1.25 V (min = +3.81 V, max = +2.22 V) ALARM 3 VSB: +3.34 V (min = +2.85 V, max = +3.15 V) ALARM Bat V: +0.59 V (min = +2.64 V, max = +3.95 V) ALARM PSU Fan: 3154 RPM (min = 1814 RPM, div = 4) VRM2 Temp: +61°C (high = +40°C, hyst = +60°C) sensor = transistor ALARM CPU1 Temp: +50.0°C (high = +70°C, hyst = +80°C) sensor = transistor CPU2 Temp: +56.0°C (high = +70°C, hyst = +80°C) sensor = transistor alarms: beep_enable: Sound alarm enabled w83627hf-isa-0c00 Adapter: ISA adapter VCore1: +1.70 V (min = +1.66 V, max = +1.84 V) VCore2: +1.70 V (min = +1.66 V, max = +1.84 V) +3.3 V: +3.30 V (min = +3.14 V, max = +3.47 V) +12 V: +12.21 V (min = +10.83 V, max = +13.21 V) -12 V: -12.77 V (min = -13.18 V, max = -10.80 V) CPU1 Fan: 2766 RPM (min = 2008 RPM, div = 16) CPU2 Fan: 2678 RPM (min = 2008 RPM, div = 16) VRM1 Temp: +51°C (high = +40°C, hyst = +60°C) sensor = transistor AGP Temp: +56.5°C (high = +40°C, hyst = +60°C) sensor = transistor DDR Temp: +41.5°C (high = +40°C, hyst = +60°C) sensor = transistor vid: +1.750 V (VRM Version 9.0) alarms: beep_enable: Sound alarm disabled
You can see that all the temperature readings contain the word "Temp", so we could monitor the temperatures
- VRM2
- CPU1
- CPU2
- DDR
- AGP
- VRM1
So what you do is, you think of an internal identifier (like "cpu", "sys", ...), take the name of the sensor (like "CPU1", "VRM1", ...) and think of a nice name (like "Processor Temparature", "Ambient Temp", ...). Then you write the internal identfier, a semicolon, the sensor name, another semicolon and the nice name with all whitespaces replaced by "+" signs, and you get something like
cpu;CPU1;Processor+Temperature
or
sys;VRM1;Ambient+Temp
Those go into the HEALTHSTATS.
Finally, you can add hard disk temperatures for IDE and S-ATA drives that support it. How to check? Call smartctl -a -s on -d ata /dev/hda (where the last parameter is the path to your HDD, could /dev/sdc or whatever). If you can find some value called "Temperature", call hddtemp /dev/hda and see if it prints just one value. If so, you can use the latter part (hda) in HEALTHSTATS, followed by a semicolon and a pretty name like "HDD front left" to create
hda;HDD+front+left
Install
After this, quit the editor and run the rrdcreate.sh:
cd /var/local/rrd ./rrdcreate.sh
This will create the .rrd-Files to data storage, the rrdupdate.sh in /usr/sbin and the mkgraphs.sh.
Usage
The rest is simple: To start collecting data, just run /usr/sbin/rrdupdate.sh. I like to add the following line to /etc/rc.local:
nice --10 /usr/sbin/rrdupdate.sh &
This will give the data collection more priority, thus reducing the risk of missing data.
Then add a cron job to periodically update the graphs. using crontab -e, adding a line like
*/10 * * * * /var/local/rrd/mkgraphs.sh
will do.
Then wait. After one or two minutes you can run rrdtool dump health.rrd or rrdtool dump usage.rrd and see if the last values are not "NaN". If so, it means that the data collection works. After ten minutes, the graphs should have been created and you can for example link them into a webpage. A simple example for this is included in the attached file (/var/local/rrd/sample-report.html).
Sample Report
An actual live sample report:
CPU activity
Today![]() |
This week![]() |
![]() |
Network activity
Today![]() |
This week![]() |
![]() |
HDD activity
Today![]() |
This week![]() |
![]() |
Temps
Today![]() |
This week![]() |
![]() |
Attachments (25)
- rrdmontools.tar.gz (3.2 KB) - added by human 14 years ago.
- cpu-day.png (33.1 KB) - added by human 14 years ago.
- cpu-week.png (31.0 KB) - added by human 14 years ago.
- hdd-day.png (20.5 KB) - added by human 14 years ago.
- hdd-week.png (17.0 KB) - added by human 14 years ago.
- net-day.png (50.7 KB) - added by human 14 years ago.
- net-week.png (50.7 KB) - added by human 14 years ago.
- temp-day.png (29.9 KB) - added by human 14 years ago.
- temp-week.png (34.4 KB) - added by human 14 years ago.
- cpu-long.png (25.4 KB) - added by human 14 years ago.
- net-long.png (52.2 KB) - added by human 14 years ago.
- hdd-long.png (12.1 KB) - added by human 14 years ago.
- temp-long.png (39.4 KB) - added by human 14 years ago.
- cpu-day-big.png (87.9 KB) - added by human 14 years ago.
- cpu-week-big.png (55.5 KB) - added by human 14 years ago.
- cpu-long-big.png (85.5 KB) - added by human 14 years ago.
- net-long-big.png (161.0 KB) - added by human 14 years ago.
- net-week-big.png (81.7 KB) - added by human 14 years ago.
- net-day-big.png (135.8 KB) - added by human 14 years ago.
- hdd-day-big.png (54.4 KB) - added by human 14 years ago.
- hdd-week-big.png (34.4 KB) - added by human 14 years ago.
- hdd-long-big.png (41.3 KB) - added by human 14 years ago.
- temp-long-big.png (125.5 KB) - added by human 14 years ago.
- temp-week-big.png (71.7 KB) - added by human 14 years ago.
- temp-day-big.png (86.7 KB) - added by human 14 years ago.
Download all attachments as: .zip