[Koha] Problems with Zebra index

Tue Jul 23 22:12:45 NZST 2013

I retried indexing the first chunk after I corrected the XML. So here
are the warning messages I got from zebraidx:

10:45:10-23/07 zebraidx(14513) [warn] Couldn't open collection.abs [No
such file or directory]
11:11:49-23/07 zebraidx(14513) [warn] Record didn't contain match fields
in (bib1,Local-number)

The first one appeared when I started indexing, the second appaered when
indexing ended. I still do not find anything, when I search in the Koha
opac for an "e".

Best
- Oliver

Am 23.07.2013 10:29, schrieb Oliver Goldschmidt:
> Hi Koha community,
>
> I am new to Koha and have spent the last week with trying to feed the
> Zebra index with our bibliographic records. This turned out to be pretty
> difficult.
> I have successfully imported our records (about 600.000) to the Koha
> database. Then I tried to use rebuild_zebra.pl to put the records into
> the index. This failed due to disk space reasons: I have 100 GB disk
> space reserved for the Zebra index (mounted on /var/lib/koha) and have
> split this space in zebra config into 45 GB for the shadow directory and
> 45 GB for the register directory. This was not sufficient, which I think
> is a little bit weired, because I think 600.000 records should not take
> so much space... So, my first question: is that normal? Does Zebra need
> so much disk space for the index? What are the directories register and
> shadow exactly for?
>
> Next try was indexing with rebuild_zebra_sliced.sh. I used the default
> value of 10000 for the chunks. First I got an error, I guess because a
> configuration value was not set properly (the script did not find
> index_mode; so I set it manually to "dom", which I guessed should be the
> correct value for indexing marcxml).
> After fixing that manually, I succeeded to split my export file into 59
> 10000-record-chunks. I tried to index the first two chunks and that
> seemed to work without problems for the first chunk (but it finished
> very fast, which made me wonder if Koha really did something - I just
> realized, that the marcxml file was not valid - but why didn't I get an
> error?). For the second chunk, there were two messages (unfortunaltely I
> cannot recall them). This is the command I used to do that:
>
> zebraidx -c /etc/koha/sites/koha/zebra-biblios.cfg -v none,fatal,warn -g
> marcxml -d biblios update
> /tmp/rebuild/export/biblio/exported_records_1000001
>
> But now, when I search in the Koha opac for an "e" for example, I still
> get no results. Though the index seems to be empty, but actually there
> are files in /var/lib/koha/koha/biblio/shadow. Is there a way to look
> into the Zebra index directly?
> I have no idea where to look next.
>
> Does anybody have any hint about that? Any help would be appreciated.
>
> Best
> -Oliver
>

-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel. 	+49 (0)40 / 428 78 - 32 91
eMail	o.goldschmidt at tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc