Re: [Koha] Problems with Zebra index

23 Jul 2013

      Jared,

thank you very much for your reply!

In fact I forgot to run zebraidx as user koha-koha, and so you were
right: I had bad permissions on the index files in shadow. I tried to
fix that by changing the ownership and restarted zebra, but that had no
effect. So I guess you are right, that I messed up my index by trying
that (which is not too bad; I can still remove the index and try again
as koha-koha). I hope nothing else broke by that mistake, and the
database is still fine?!

To increase the disk space, thats exactly what I did: I changed the
value in zebra-biblios.cfg. But can you explain, what the directories
are used for? After finishing indexing, will I have data in both
directories or could I configure my 100 GB disk, so that both
directories can take 80 GB space? I will try that and see...

I still have a problem with rebuild_zebra.pl: it ignores the -s
parameter. If I understood that right, rebuild_zebra should use an
existing exported_records file, if I use the parameter -s and -d. But it
doesn't. Any time I'm starting rebuild_zebra, the script exports my
database (this takes pretty much time and I wanted to bypass it). Is
this a bug or am I missing anything?

Best
- Oliver

Am 23.07.2013 13:38, schrieb Jared Camins-Esakov:
...
Oliver,
I have good news and bad news. The good news is, the fix to your
problem is probably easy. The bad news is that running the zebraidx
command manually more than likely messed up your installation.
It sounds like your first problem can be solved simply by increasing
the space that Zebra will use (it is not uncommon to need in excessive
of 100GB for indexes in a large installation). I'm not sure how you
increased the space allotted, so I'm going to provide instructions for
the correct way to do this that you can check your work against. If
you open up the zebra-biblios.cfg and zebra-biblios-dom.cfg files that
Koha installed (in /etc/koha/sites/koha/ ), you'll need to change two
lines, the lines starting with register and shadow. At the end of the
line it says 20G or 45G, depending whether you changed that. Change
those numbers to, say, 80G.
rebuild_zebra_sliced would not help you in this instance, because your
problem is the amount of disk space required, not a bad record.
Now for the bad news. If you ran zebraidx as any user other than
koha-koha, your permissions are going to be all wrong. You can try
changing the owner recursively on /var/lib/koha/koha to koha-koha.
That might fix it (but I am not sure, since I haven't tried). The
zebra_bib_index_mode is easy to fix, fortunately. Just change
zebra_bib_index_mode to grs1, run rebuild_zebra.pl
<http://rebuild_zebra.pl> -r -b -x and you should be fine. You can
worry about switching to DOM indexing once you have indexing with
GRS-1
working: http://wiki.koha-community.org/wiki/Switching_to_dom_indexing
Regards,
Jared
On Tue, Jul 23, 2013 at 4:29 AM, Oliver Goldschmidt
<o.goldschmidt@tuhh.de <mailto:o.goldschmidt@tuhh.de>> wrote:
Hi Koha community,
I am new to Koha and have spent the last week with trying to feed the
    Zebra index with our bibliographic records. This turned out to be
    pretty
    difficult.
    I have successfully imported our records (about 600.000) to the Koha
    database. Then I tried to use rebuild_zebra.pl
    <http://rebuild_zebra.pl> to put the records into
    the index. This failed due to disk space reasons: I have 100 GB disk
    space reserved for the Zebra index (mounted on /var/lib/koha) and have
    split this space in zebra config into 45 GB for the shadow
    directory and
    45 GB for the register directory. This was not sufficient, which I
    think
    is a little bit weired, because I think 600.000 records should not
    take
    so much space... So, my first question: is that normal? Does Zebra
    need
    so much disk space for the index? What are the directories
    register and
    shadow exactly for?
Next try was indexing with rebuild_zebra_sliced.sh. I used the default
    value of 10000 for the chunks. First I got an error, I guess because a
    configuration value was not set properly (the script did not find
    index_mode; so I set it manually to "dom", which I guessed should
    be the
    correct value for indexing marcxml).
    After fixing that manually, I succeeded to split my export file
    into 59
    10000-record-chunks. I tried to index the first two chunks and that
    seemed to work without problems for the first chunk (but it finished
    very fast, which made me wonder if Koha really did something - I just
    realized, that the marcxml file was not valid - but why didn't I
    get an
    error?). For the second chunk, there were two messages
    (unfortunaltely I
    cannot recall them). This is the command I used to do that:
zebraidx -c /etc/koha/sites/koha/zebra-biblios.cfg -v
    none,fatal,warn -g
    marcxml -d biblios update
    /tmp/rebuild/export/biblio/exported_records_1000001
But now, when I search in the Koha opac for an "e" for example, I
    still
    get no results. Though the index seems to be empty, but actually there
    are files in /var/lib/koha/koha/biblio/shadow. Is there a way to look
    into the Zebra index directly?
    I have no idea where to look next.
Does anybody have any hint about that? Any help would be appreciated.
Best
    -Oliver
--
    Oliver Goldschmidt
    TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
    Denickestr. 22
    21071 Hamburg - Harburg
    Tel.    +49 (0)40 / 428 78 - 32 91
    eMail   o.goldschmidt@tuhh.de <mailto:o.goldschmidt@tuhh.de>
    --
    GPG/PGP-Schlüssel:
    http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
_______________________________________________
    Koha mailing list  http://koha-community.org
    Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz>
    http://lists.katipo.co.nz/mailman/listinfo/koha
-- 
Jared Camins-Esakov
Bibliographer, C & P Bibliography Services, LLC
(phone) +1 (917) 727-3445
(e-mail) jcamins@cpbibliography.com <mailto:jcamins@cpbibliography.com>
(web) http://www.cpbibliography.com/
-- 
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel. 	+49 (0)40 / 428 78 - 32 91
eMail	o.goldschmidt@tuhh.de
--
GPG/PGP-Schlüssel: 
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc