RRD Accelerator Design
Problem
The rrd data format, in theory is quite efficient when it comes to minimize the amount of data that has to be written for a single update. While a single update writes as little as 32 Bytes of data, it will cause the OS update 2-3 disk blocks. And updatein a diskblock means reading (at least) 512 Bytes of data, updating the block and writing it back to disk. If the block is already cached, the reading bit will be skipped, makeing things quite a bit faster. This is why optimal cache management helps so much in speeding up rrdtool. On the other hand, that 512 Byte write does not go away. So if there was a way to bundle up several updates to a single rrd file, this would give us much better performance since still only that 512 Bytes block would have to be written to disk.
Solution
Instead of writing updates straight to the rrd file, they are stored in memory until:
- a configurable amount of time has passed
- a configurable number of updates have been accumulated for a file
- memory is full
- we receive a flush command
- some rrdtool function other than rrdupdate wants to read from the rrd file
Implementation
When the rrdtool accelerator daemon is running, it creates a unix domain socket in a well known location (/tmp/rrd-accelerator-$USER/socket). When any of the rrdtool commands are run, they first check if the socket is there. If so they will either send a flush command for the file they are interested in, or the data for rrdtool update.
Problems
When running rrdtool accelerator in combination with a mix of rrdtool versions, some knowing about the accelerator and others NOT knowing about it, then interesting things may happen. File locking should help here (as long as all participating versions support it).
Ideas
- only start daemon mode when environment variable RRDD_SOCKET is set
- write a journal