Ticket #178 (closed defect: fixed)

Opened 17 months ago

Last modified 17 months ago

Data corruption in RRA 1.3 when using MMAP and multiple update strings

Reported by: human Owned by: oetiker
Priority: blocker Milestone: RRDtool 1.3
Component: rrd_update Version:
Keywords: data corruption Cc: kbrint@…

Description

Conditions

In RRDTool 1.3, the use of two pointers rra_current and rrd_file->pos to track read/write position leads to data corruption. This is triggered under the following circumstances:

  • RRDTool >= 1.3
  • using MMAP
  • multiple update strings (i.e. update 500:1 600:2 700:3)
  • the RRA updates cause a wrap of the cur_row pointer on disk.

The Effect

  • rra[0][row 0] is not modified
  • if more than one rra exists, rra[1][row 0] receives the value that should have gone to rra[0][row 0]
  • if there is no rra[1], then the update goes into the memory location immediately following the mmap() region, with unpredictable effect.

How the Bug is Triggered

process_arg()

This is called once for each update string.

  • rra_current is reset to rra_begin
        if (*rra_current != rra_begin) {
    #ifndef HAVE_MMAP
            if (rrd_seek(rrd_file, rra_begin, SEEK_SET) != 0) {
                rrd_set_error("seek error in rrd");
                return -1;
            }
    #endif
            *rra_current = rra_begin;
        }
        ....
        write_to_rras(...);
    

write_to_rras()

NOTE WELL:

  • When processing a single update string, rra_step_cnt[rra_idx] == 1
    • Therefore, the first update block is always used (starting with /* write the first row */ around rrd_update.c:1879)
  • The offset of rra[0][row 0] == rra_begin
  • advances cur_row and rra_current, checking for overlap
                ...
                rrd->rra_ptr[rra_idx].cur_row++;
                if (rrd->rra_ptr[rra_idx].cur_row >=
                    rrd->rra_def[rra_idx].row_cnt)
                    rrd->rra_ptr[rra_idx].cur_row = 0;  /* wrap around */
                /* position on the first row */
                rra_pos_tmp = rra_start +
                    (rrd->stat_head->ds_cnt) * (rrd->rra_ptr[rra_idx].cur_row) *
                    sizeof(rrd_value_t);
                if (rra_pos_tmp != *rra_current) {
                    if (rrd_seek(rrd_file, rra_pos_tmp, SEEK_SET) != 0) {
                        rrd_set_error("seek error in rrd");
                        return -1;
                    }
                    *rra_current = rra_pos_tmp;
                }
                ...
                write_RRA_row(...)
    
  • When processing only a single update string, we MUST have just come from process_arg
    • Therefore *rrd_current == rra_begin
    • This is the same as the offset of rra[0][row 0]
    • When we wrap (rra[0].cur_row = 0):
      • rrd_file->pos now extents past rra[0]
        • (it was incremented in the previous rrd_write)
      • the check for rra_pos_tmp != *rra_current IS ALWAYS FALSE
        • This is why it only affects rra[0]
      • Therefore, rrd_seek IS NOT CALLED
      • Therefore, rrd_file->pos IS NOT UPDATED back to rra[0][row 0]

rrd_write()

This is called from write_RRA_row()

When using MMAP, this always writes based on rrd_file->pos. This causes the corruption when rrd_file->pos is wrong.

#ifdef HAVE_MMAP
    ...
    memcpy(rrd_file->file_start + rrd_file->pos, buf, count);
    rrd_file->pos += count;
    return count;       /* mimmic write() semantics */
#else

Verify the Problem

The attached overlap.pl illustrates the RRA corruption.

% ./overlap.pl
rra[0] row 22 for time 1220633990 is NaN, but it shouldn't be ...
rra[1] row 0 for time 1220634060 contains the value 1220633990!
% rrdtool dump bad.rrd
...

Patch

A patch is attached.

  • calls to rrd_seek should never be wrapped in #ifndef HAVE_MMAP (as in process_arg)
  • repeated code in write_to_rras has been reduced to a single update loop
    • use of helper variables improves readability
  • extra checks when wrapping cur_row=0
    • it's not enough to check rra_current because it gets reset before each invocation to write_to_rras.

Future?

Is it possible to use only a single file position pointer?

Attachments

filepos.patch Download (6.2 KB) - added by human 17 months ago.
patch to rrd_update
overlap.pl Download (1.6 KB) - added by human 17 months ago.
script to re-create the bug

Change History

Changed 17 months ago by human

patch to rrd_update

Changed 17 months ago by human

script to re-create the bug

Changed 17 months ago by human

  • cc kbrint@… added

Contact

kevin brintnall <kbrint@…>

Changed 17 months ago by oetiker

  • status changed from new to closed
  • resolution set to fixed

iiks ... a data corruption bug ... that is bad ... I have applied your patch and am now releasing rrdtool 1.3.2

thanks tobi

Note: See TracTickets for help on using tickets.

NOTE: The content of this website is accessible with any browser. The graphical design though relies completely on CSS2 styles. If you see this text, this means that your browser does not support CSS2. Consider upgrading to a standard conformant browser like Mozilla Firefox or Opera but also Apple's Safari or KDE's Konqueror for example. It may also be that you are looking at a mirror page which did not copy the CSS for this page. Or if some pictu res are missing, then the mirror may not have picked up the contents of the inc directory.