Version 8 (modified by oetiker, 15 years ago) (diff)

--

Creating a portable RRD format

Current Situation

The RRD data format is native. This means it depends on the architecture of the machine as well as on the OS used to write the data. This has the advantage of being fast and simple. The downside of this is, that data can in general not be accessed from two different HW/OS combinations.

There are some provisions in RRD to detect such cross architecture access. But detection is not perfect. Some combinations are bound to even almost work. SPARC to PPC only differs in the representation of NANs.

Figuring 64bit Floating Point Numbers

The portable RRD format works on all platforms transparently. The first Idea was to use Suns XDR format for RRD. Unfortunately, XDR does not handle NANs which are pretty essential for RRDtool. So I did some investigations into binary representation of IEEE 754 floating point data. I found that it is actually pretty simple to bridge the gap between sparc, ppc and x86 at least, so I assume it won't be rocket science todo other CPUs as well. The following program helped a lot in this task. It shows the binary representation of a few 'interesting' floating point values.

#include <stdio.h>
#include <inttypes.h>
typedef union INSPECTOR {
    uint8_t   b[8];
    uint64_t  l;
    double    f;
} INSPECTOR;

int main(
    int argc,   
    char *argv[])
{
    int i,ii;
    double number[] = {
        0,1,-1,0.0/0.0,1.0/0.0,-1.0/0.0,2,4,8,16,8.642135E130
    };
    for (i=0;i<11;i++){
        INSPECTOR native;
        native.f = number[i];
        printf("%16e -> ",native.f);
        for (ii=0;ii<8;ii++)
            printf(" %02x",native.b[ii]);
        printf("\n");
    }
    return 0;
}

I used this to figure the binary representation of double precision floating point numbers on several architectures:

SPARC 32 and 64 bit:

    0.000000e+00 -> 00 00 00 00  00 00 00 00
    1.000000e+00 -> 3f f0 00 00  00 00 00 00
   -1.000000e+00 -> bf f0 00 00  00 00 00 00
             NaN -> 7f ff ff ff  ff ff ff ff
             Inf -> 7f f0 00 00  00 00 00 00
            -Inf -> ff f0 00 00  00 00 00 00
    2.000000e+00 -> 40 00 00 00  00 00 00 00
    4.000000e+00 -> 40 10 00 00  00 00 00 00
    8.000000e+00 -> 40 20 00 00  00 00 00 00
    1.600000e+01 -> 40 30 00 00  00 00 00 00
   8.642135e+130 -> 5b 1f 2b 43  c7 c0 25 2f

PPC 32 bit

    0.000000e+00 -> 00 00 00 00  00 00 00 00
    1.000000e+00 -> 3f f0 00 00  00 00 00 00
   -1.000000e+00 -> bf f0 00 00  00 00 00 00
             nan -> 7f f8 00 00  00 00 00 00
             inf -> 7f f0 00 00  00 00 00 00
            -inf -> ff f0 00 00  00 00 00 00
    2.000000e+00 -> 40 00 00 00  00 00 00 00
    4.000000e+00 -> 40 10 00 00  00 00 00 00
    8.000000e+00 -> 40 20 00 00  00 00 00 00
    1.600000e+01 -> 40 30 00 00  00 00 00 00
   8.642135e+130 -> 5b 1f 2b 43  c7 c0 25 2f

x86 32 and 64 bit

    0.000000e+00 -> 00 00 00 00  00 00 00 00
    1.000000e+00 -> 00 00 00 00  00 00 f0 3f
   -1.000000e+00 -> 00 00 00 00  00 00 f0 bf
             nan -> 00 00 00 00  00 00 f8 7f
             inf -> 00 00 00 00  00 00 f0 7f
            -inf -> 00 00 00 00  00 00 f0 ff
    2.000000e+00 -> 00 00 00 00  00 00 00 40
    4.000000e+00 -> 00 00 00 00  00 00 10 40
    8.000000e+00 -> 00 00 00 00  00 00 20 40
    1.600000e+01 -> 00 00 00 00  00 00 30 40
   8.642135e+130 -> 2f 25 c0 c7  43 2b 1f 5b

As you can see, there is not all that much difference between the architectures (it is all IEEE 754 after all). For one there is the endianess difference and then there are the SPARCs how seem to have their own idea regarding NANs. In any event, a converter between these formats is only a few defines away.

#define endianflip(A) ((((uint64_t)(A) & 0xff00000000000000LL) >> 56) | \
                       (((uint64_t)(A) & 0x00ff000000000000LL) >> 40) | \
                       (((uint64_t)(A) & 0x0000ff0000000000LL) >> 24) | \
                       (((uint64_t)(A) & 0x000000ff00000000LL) >> 8)  | \
                       (((uint64_t)(A) & 0x00000000ff000000LL) << 8)  | \
                       (((uint64_t)(A) & 0x0000000000ff0000LL) << 24) | \
                       (((uint64_t)(A) & 0x000000000000ff00LL) << 40) | \
                       (((uint64_t)(A) & 0x00000000000000ffLL) << 56))

#define sparc2x86(A)   ((uint64_t)(A) == 0x7fffffffffffffffLL \
                                       ? 0x000000000000f87fLL \
                                       : endianflip(A))

#define x862sparc(A)   ((uint64_t)(A) == 0x000000000000f87fLL \
                                       ? 0x7fffffffffffffffLL \
                                       : endianflip(A))

#define ppc2x86(A)     endianflip(A)

#define x862ppc(A)     endianflip(A)

Data Alignment

Most of today's workstations run either 32 or 64 bit. This also has an influence on the data layout. The classic RRDtool data format heavily relies on doubles and longs, bundled into structs. This is a challenging mix for portability:

  • longs are 4 bytes long in 32bit OSes. In 64 bit OSes, they are often represented as 8 byte integers.
  • struct members are aligned to either 32 bit or 64 bit boundaries.

Below is an example of a the memory layout of several simple structs, made up from integers (32 bit) and doubles (64 bit).

32 bit (16 byte) : IIII.DDDD.DDDD.IIII
64 bit (24 byte) : IIII ~~~~.DDDD DDDD.IIII ~~~~

32 bit (12 byte) : IIII.IIII.IIII
64 bit (12 byte) : IIII IIII.IIII

32 bit (16 byte) : IIII.IIII.DDDD.DDDD.IIII.IIII
64 bit (24 byte) : IIII IIII.DDDD DDDD.IIII IIII

This means, that in order to produce a portable data format, structs must be laid out such that they are aligned the same, on 32 and 64 bit systems.

Longs and Integers

On 32 bit architectures, integers and longs are 32 bit wide. While longs are normally 64 bit wide on 64 bit architectures. Since the RRDtool data format uses a lot of longs, this also has to be addressed in a portable format. Fortunately, there are datatypes that are less architecture dependent, defined in the inttypes header.

#include <inttypes.h>
uint8_t  unsigned_8_bit_integer;
uint32_t unsigned_32_bit_integer;
int64_t  signed_64_bit_integer;
/* and so on */

Conclusion

Based on this information a portable RRDtool data format that works at least on PPC, x86 and SPARC will be not all that difficult to design.

Information on other architectures is welcome: Alpha, PA-RISC, Itanium, MIPS.


NOTE: The content of this website is accessible with any browser. The graphical design though relies completely on CSS2 styles. If you see this text, this means that your browser does not support CSS2. Consider upgrading to a standard conformant browser like Mozilla Firefox or Opera but also Apple's Safari or KDE's Konqueror for example. It may also be that you are looking at a mirror page which did not copy the CSS for this page. Or if some pictu res are missing, then the mirror may not have picked up the contents of the inc directory.