Re: [Koha] Problems with Zebra index

23 Jul 2013

      I retried indexing the first chunk after I corrected the XML. So here
are the warning messages I got from zebraidx:

10:45:10-23/07 zebraidx(14513) [warn] Couldn't open collection.abs [No
such file or directory]
11:11:49-23/07 zebraidx(14513) [warn] Record didn't contain match fields
in (bib1,Local-number)

The first one appeared when I started indexing, the second appaered when
indexing ended. I still do not find anything, when I search in the Koha
opac for an "e".

Best
- Oliver

Am 23.07.2013 10:29, schrieb Oliver Goldschmidt:
...
Hi Koha community,
I am new to Koha and have spent the last week with trying to feed the
Zebra index with our bibliographic records. This turned out to be pretty
difficult.
I have successfully imported our records (about 600.000) to the Koha
database. Then I tried to use rebuild_zebra.pl to put the records into
the index. This failed due to disk space reasons: I have 100 GB disk
space reserved for the Zebra index (mounted on /var/lib/koha) and have
split this space in zebra config into 45 GB for the shadow directory and
45 GB for the register directory. This was not sufficient, which I think
is a little bit weired, because I think 600.000 records should not take
so much space... So, my first question: is that normal? Does Zebra need
so much disk space for the index? What are the directories register and
shadow exactly for?
Next try was indexing with rebuild_zebra_sliced.sh. I used the default
value of 10000 for the chunks. First I got an error, I guess because a
configuration value was not set properly (the script did not find
index_mode; so I set it manually to "dom", which I guessed should be the
correct value for indexing marcxml).
After fixing that manually, I succeeded to split my export file into 59
10000-record-chunks. I tried to index the first two chunks and that
seemed to work without problems for the first chunk (but it finished
very fast, which made me wonder if Koha really did something - I just
realized, that the marcxml file was not valid - but why didn't I get an
error?). For the second chunk, there were two messages (unfortunaltely I
cannot recall them). This is the command I used to do that:
zebraidx -c /etc/koha/sites/koha/zebra-biblios.cfg -v none,fatal,warn -g
marcxml -d biblios update
/tmp/rebuild/export/biblio/exported_records_1000001
But now, when I search in the Koha opac for an "e" for example, I still
get no results. Though the index seems to be empty, but actually there
are files in /var/lib/koha/koha/biblio/shadow. Is there a way to look
into the Zebra index directly?
I have no idea where to look next.
Does anybody have any hint about that? Any help would be appreciated.
Best
-Oliver
-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel. 	+49 (0)40 / 428 78 - 32 91
eMail	o.goldschmidt@tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc