Chapter 15. Issues with Report Merging

In some cases, a merged report doesn't display the right


Google

information. We outline some worst case scenarios, and justify our implementation.

Suppose log file 1 (“requests” with “sizes”) looks like:

requestsize
A 12
B 11
C 10

while log file 2 looks like:

requestsize
D 3
E 2
F 1

We report on the top 2 biggest requests, so the report from log 1 looks like:

requestsize
A 12
B 11

while the report from log 2 would look like:

requestsize
D 3
E 2

Now we change the superservice.cfg file to list the top-4 biggest items. A naive merge would lead to:

requestsize
A 12
B 11
D 3
E 2

Of course, this should've been:

requestsize
A 12
B 11
C 10
D 3

This effect does not occur when keeping the top-limit to the same value. However, when we're not reporting on distinct values in the log, but are summing, more horrible things might happen. Consider this: We want to report on the total size by client. Logs look like:

clientsize
a 12
b 11
c 10

and

clientsize
d 4
e 4
c 3

Reports from these logs would look like:

clientsize
a 12
b 11

clientsize
d 4
e 4

After naively merging, one would get:

clientsize
a 12
b 11

In fact, the complete report should look like:

clientsize
c 13
a 12

Luckily, the Lire merging algorithm is not this naive: in fact, the XML reports store a little more records than actually needed. This heuristic trick leads to sane merged reports in most cases. However, since this is merely a heuristic trick, it is no waterproof guarantee.

See the description of the guess_extra_entries routine in the Lire::AsciiDlf::Group manpage for more implementation details.