[Koha] rebuild_zebra issue

Frédéric Demians frederic at tamil.fr
Fri Mar 11 19:24:57 NZDT 2011


 > Could you please give a bit more detail on this, as in define "clean"
 > as you use it here?

rebuild_zebra.pl works in two stages: (1) export all/queued records to a
file; (2) gives the exported file to Zebra indexer (zebraidx command).

-nosanitize option modify the first stage. Without this option, during
stage 1, records are 'sanitized' before being outputted in the file, ie
their leader is fixed, biblionumber is checked, UNIMARC tag 100 is
forced to UTF-8, and few other things. This 'sanitizing' requires to
read records, parse them into a Perl object, manipulate the object, and
finally format it back into XML. This consumes CPU/memory resource, and
take time. With -nosanitize option, records are read from MySQL, and
directly written in the export file. It decreases drastically the time
rebuild_zebra.pl spend in stage 1.

In this perspective, a 'clean' record is a record which doesn't need to
be sanitized: leader ok, correct record id, etc.

By the way, coming back to the initial question, it could be interesting
also to improve performance of stage 2, so improving Zebra index raw
performances.


More information about the Koha mailing list