[Koha] Problems with Zebra index

Jared Camins-Esakov jcamins at cpbibliography.com
Tue Jul 23 23:38:22 NZST 2013


Oliver,

I have good news and bad news. The good news is, the fix to your problem is
probably easy. The bad news is that running the zebraidx command manually
more than likely messed up your installation.

It sounds like your first problem can be solved simply by increasing the
space that Zebra will use (it is not uncommon to need in excessive of 100GB
for indexes in a large installation). I'm not sure how you increased the
space allotted, so I'm going to provide instructions for the correct way to
do this that you can check your work against. If you open up the
zebra-biblios.cfg and zebra-biblios-dom.cfg files that Koha installed (in
/etc/koha/sites/koha/ ), you'll need to change two lines, the lines
starting with register and shadow. At the end of the line it says 20G or
45G, depending whether you changed that. Change those numbers to, say, 80G.

rebuild_zebra_sliced would not help you in this instance, because your
problem is the amount of disk space required, not a bad record.

Now for the bad news. If you ran zebraidx as any user other than koha-koha,
your permissions are going to be all wrong. You can try changing the owner
recursively on /var/lib/koha/koha to koha-koha. That might fix it (but I am
not sure, since I haven't tried). The zebra_bib_index_mode is easy to fix,
fortunately. Just change zebra_bib_index_mode to grs1, run
rebuild_zebra.pl-r -b -x and you should be fine. You can worry about
switching to DOM
indexing once you have indexing with GRS-1 working:
http://wiki.koha-community.org/wiki/Switching_to_dom_indexing

Regards,
Jared



On Tue, Jul 23, 2013 at 4:29 AM, Oliver Goldschmidt
<o.goldschmidt at tuhh.de>wrote:

> Hi Koha community,
>
> I am new to Koha and have spent the last week with trying to feed the
> Zebra index with our bibliographic records. This turned out to be pretty
> difficult.
> I have successfully imported our records (about 600.000) to the Koha
> database. Then I tried to use rebuild_zebra.pl to put the records into
> the index. This failed due to disk space reasons: I have 100 GB disk
> space reserved for the Zebra index (mounted on /var/lib/koha) and have
> split this space in zebra config into 45 GB for the shadow directory and
> 45 GB for the register directory. This was not sufficient, which I think
> is a little bit weired, because I think 600.000 records should not take
> so much space... So, my first question: is that normal? Does Zebra need
> so much disk space for the index? What are the directories register and
> shadow exactly for?
>
> Next try was indexing with rebuild_zebra_sliced.sh. I used the default
> value of 10000 for the chunks. First I got an error, I guess because a
> configuration value was not set properly (the script did not find
> index_mode; so I set it manually to "dom", which I guessed should be the
> correct value for indexing marcxml).
> After fixing that manually, I succeeded to split my export file into 59
> 10000-record-chunks. I tried to index the first two chunks and that
> seemed to work without problems for the first chunk (but it finished
> very fast, which made me wonder if Koha really did something - I just
> realized, that the marcxml file was not valid - but why didn't I get an
> error?). For the second chunk, there were two messages (unfortunaltely I
> cannot recall them). This is the command I used to do that:
>
> zebraidx -c /etc/koha/sites/koha/zebra-biblios.cfg -v none,fatal,warn -g
> marcxml -d biblios update
> /tmp/rebuild/export/biblio/exported_records_1000001
>
> But now, when I search in the Koha opac for an "e" for example, I still
> get no results. Though the index seems to be empty, but actually there
> are files in /var/lib/koha/koha/biblio/shadow. Is there a way to look
> into the Zebra index directly?
> I have no idea where to look next.
>
> Does anybody have any hint about that? Any help would be appreciated.
>
> Best
> -Oliver
>
> --
> Oliver Goldschmidt
> TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
> Denickestr. 22
> 21071 Hamburg - Harburg
> Tel.    +49 (0)40 / 428 78 - 32 91
> eMail   o.goldschmidt at tuhh.de
> --
> GPG/PGP-Schlüssel:
> http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
>
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> http://lists.katipo.co.nz/mailman/listinfo/koha
>



-- 
Jared Camins-Esakov
Bibliographer, C & P Bibliography Services, LLC
(phone) +1 (917) 727-3445
(e-mail) jcamins at cpbibliography.com
(web) http://www.cpbibliography.com/


More information about the Koha mailing list