Ticket #96 (assigned defect)

Opened 3 years ago

Last modified 12 months ago

IIS6, ActivePerl, RRD problem

Reported by: human Owned by: oetiker
Priority: major Milestone:
Component: misc Version:
Keywords: update graph IIS ActivePerl Cc: mfowler@…

Description

The Problem:

An IIS6 web server displaying RRD graphs (constantly refreshed every 10 seconds) occasionally gets in a state where it displays the Active Perl source code of the ASP page rather than the graphs.
It seems to happen when a web query to generate graphs happens at the same time as the Perl script is updating the RRD.

Manually restarting IIS is the only reliable fix.

(Yes, I realise that this could be a problem with Windows, IIS, ActivePerl, RRD, or more likely, some strange interaction between the above components that only I am doing, but please bear with me - I want to either eliminate RRD as the source of the problem, or get a lurking bug fixed.)

The Details:

I have a Perl script (v5.8.8 ActivePerl build 820) that pings several dozen servers in parallel and writes the results to RRD files (one per server). It does this every 10 seconds, starting every time the last digit of the seconds on the clock hits "0". (This becomes relevant a bit later)

The same server (Windows 2003, SP2, all hotfixes) runs IIS to serve a web page that displays the results of these pings as RRD graphs.
The page is written in PerlScript ASP. (Starts with <%@ Language=PerlScript %> then the rest of it is Perl.)
With no parameters, the page would display a table showing the last result for every ping RRD (46 of them). (RRDs::last to get the time followed by RRDs::fetch to get the data)
Passing a "site" parameter would display graphs for all the servers at that site - usually 5 or less. (just RRDs::graph)
Passing a "site" and a "host" parameter would display detailed graphs for just one server.

Originally, I had just a simple refresh of 10 seconds on the pages.
After a while (sometimes hours, sometimes minutes), the page would fail, displaying the PerlScript code for the page rather than the graphs or table. Interestingly enough it would keep refreshing every 10 seconds.
The error in the IIS log was "GET /ping/ls_ping_status.asp |0|80004005|Internal_Error 80"

The Assumption:

I think that the problem is related to the ASP page's RRD queries happening at the same time as the Perl script is updating the RRD files.
(I say this because I've been successfully using IIS/ActivePerl/RRD for many years and have only ever encountered this problem where a web page is reading from a lot of RRD files at once, greatly increasing the odds of a simultaneous conflict.)

The pinging Perl script will fail gracefully, just spitting out the RRDs::error if it has any problems.
Unfortunately, the ASP page takes the whole web server with it when it fails in this way.
(This, I believe, is ActivePerl not being as robust as it should, but I still want to find out what triggers this event to avoid it)

The Questions:

Have you ever seen this problem before?
How does the RRD update and query process work?
What locking mechanism is used? File? Record? Any?
How do the RRD query functions lock the file to prevent updates while they are dumping data or drawing graphs?

Some of the things I've tried to avoid the problem:

* set the web page refresh time to line up with the seconds digit "5"

our $cycletime = 10;
my $now = time();
my $refresh = (int($now / $cycletime) + 1.5) * $cycletime - $now;

* don't draw any graphs while in the "danger time" when the seconds digit is 0, 1, or 2

my $digit = time() % $cycletime;
if ($digit < 3) {
  sleep(3 - $digit);
}

Any help is most appreciated,

Mark Fowler mfowler@…

Change History

in reply to: ↑ description   Changed 3 years ago by oetiker

  • status changed from new to assigned

Replying to human:

The Questions: Have you ever seen this problem before?

no, I don't use windows in server functions (if I can help it)

How does the RRD update and query process work?

rrdtool update locks the file while updating ... rrd_fetch currently does not care it could be enhanced by a read lock ...

What locking mechanism is used? File? Record? Any?

file

How do the RRD query functions lock the file to prevent updates while they are dumping data or drawing graphs?

they don't ... only the update does

try adding blocking locking code to rrd_fetch as well ...

cheers tobi

Some of the things I've tried to avoid the problem: * set the web page refresh time to line up with the seconds digit "5" {{{ our $cycletime = 10; my $now = time(); my $refresh = (int($now / $cycletime) + 1.5) * $cycletime - $now; }}} * don't draw any graphs while in the "danger time" when the seconds digit is 0, 1, or 2 {{{ my $digit = time() % $cycletime; if ($digit < 3) { sleep(3 - $digit); } }}} Any help is most appreciated, Mark Fowler mfowler@…

Note: See TracTickets for help on using tickets.

NOTE: The content of this website is accessible with any browser. The graphical design though relies completely on CSS2 styles. If you see this text, this means that your browser does not support CSS2. Consider upgrading to a standard conformant browser like Mozilla Firefox or Opera but also Apple's Safari or KDE's Konqueror for example. It may also be that you are looking at a mirror page which did not copy the CSS for this page. Or if some pictu res are missing, then the mirror may not have picked up the contents of the inc directory.