[Koha] Problems with Zebra index
Oliver Goldschmidt
o.goldschmidt at tuhh.de
Wed Jul 24 00:30:47 NZST 2013
Jared,
thank you very much for your reply!
In fact I forgot to run zebraidx as user koha-koha, and so you were
right: I had bad permissions on the index files in shadow. I tried to
fix that by changing the ownership and restarted zebra, but that had no
effect. So I guess you are right, that I messed up my index by trying
that (which is not too bad; I can still remove the index and try again
as koha-koha). I hope nothing else broke by that mistake, and the
database is still fine?!
To increase the disk space, thats exactly what I did: I changed the
value in zebra-biblios.cfg. But can you explain, what the directories
are used for? After finishing indexing, will I have data in both
directories or could I configure my 100 GB disk, so that both
directories can take 80 GB space? I will try that and see...
I still have a problem with rebuild_zebra.pl: it ignores the -s
parameter. If I understood that right, rebuild_zebra should use an
existing exported_records file, if I use the parameter -s and -d. But it
doesn't. Any time I'm starting rebuild_zebra, the script exports my
database (this takes pretty much time and I wanted to bypass it). Is
this a bug or am I missing anything?
Best
- Oliver
Am 23.07.2013 13:38, schrieb Jared Camins-Esakov:
> Oliver,
>
> I have good news and bad news. The good news is, the fix to your
> problem is probably easy. The bad news is that running the zebraidx
> command manually more than likely messed up your installation.
>
> It sounds like your first problem can be solved simply by increasing
> the space that Zebra will use (it is not uncommon to need in excessive
> of 100GB for indexes in a large installation). I'm not sure how you
> increased the space allotted, so I'm going to provide instructions for
> the correct way to do this that you can check your work against. If
> you open up the zebra-biblios.cfg and zebra-biblios-dom.cfg files that
> Koha installed (in /etc/koha/sites/koha/ ), you'll need to change two
> lines, the lines starting with register and shadow. At the end of the
> line it says 20G or 45G, depending whether you changed that. Change
> those numbers to, say, 80G.
>
> rebuild_zebra_sliced would not help you in this instance, because your
> problem is the amount of disk space required, not a bad record.
>
> Now for the bad news. If you ran zebraidx as any user other than
> koha-koha, your permissions are going to be all wrong. You can try
> changing the owner recursively on /var/lib/koha/koha to koha-koha.
> That might fix it (but I am not sure, since I haven't tried). The
> zebra_bib_index_mode is easy to fix, fortunately. Just change
> zebra_bib_index_mode to grs1, run rebuild_zebra.pl
> <http://rebuild_zebra.pl> -r -b -x and you should be fine. You can
> worry about switching to DOM indexing once you have indexing with
> GRS-1
> working: http://wiki.koha-community.org/wiki/Switching_to_dom_indexing
>
> Regards,
> Jared
>
>
>
> On Tue, Jul 23, 2013 at 4:29 AM, Oliver Goldschmidt
> <o.goldschmidt at tuhh.de <mailto:o.goldschmidt at tuhh.de>> wrote:
>
> Hi Koha community,
>
> I am new to Koha and have spent the last week with trying to feed the
> Zebra index with our bibliographic records. This turned out to be
> pretty
> difficult.
> I have successfully imported our records (about 600.000) to the Koha
> database. Then I tried to use rebuild_zebra.pl
> <http://rebuild_zebra.pl> to put the records into
> the index. This failed due to disk space reasons: I have 100 GB disk
> space reserved for the Zebra index (mounted on /var/lib/koha) and have
> split this space in zebra config into 45 GB for the shadow
> directory and
> 45 GB for the register directory. This was not sufficient, which I
> think
> is a little bit weired, because I think 600.000 records should not
> take
> so much space... So, my first question: is that normal? Does Zebra
> need
> so much disk space for the index? What are the directories
> register and
> shadow exactly for?
>
> Next try was indexing with rebuild_zebra_sliced.sh. I used the default
> value of 10000 for the chunks. First I got an error, I guess because a
> configuration value was not set properly (the script did not find
> index_mode; so I set it manually to "dom", which I guessed should
> be the
> correct value for indexing marcxml).
> After fixing that manually, I succeeded to split my export file
> into 59
> 10000-record-chunks. I tried to index the first two chunks and that
> seemed to work without problems for the first chunk (but it finished
> very fast, which made me wonder if Koha really did something - I just
> realized, that the marcxml file was not valid - but why didn't I
> get an
> error?). For the second chunk, there were two messages
> (unfortunaltely I
> cannot recall them). This is the command I used to do that:
>
> zebraidx -c /etc/koha/sites/koha/zebra-biblios.cfg -v
> none,fatal,warn -g
> marcxml -d biblios update
> /tmp/rebuild/export/biblio/exported_records_1000001
>
> But now, when I search in the Koha opac for an "e" for example, I
> still
> get no results. Though the index seems to be empty, but actually there
> are files in /var/lib/koha/koha/biblio/shadow. Is there a way to look
> into the Zebra index directly?
> I have no idea where to look next.
>
> Does anybody have any hint about that? Any help would be appreciated.
>
> Best
> -Oliver
>
> --
> Oliver Goldschmidt
> TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
> Denickestr. 22
> 21071 Hamburg - Harburg
> Tel. +49 (0)40 / 428 78 - 32 91
> eMail o.goldschmidt at tuhh.de <mailto:o.goldschmidt at tuhh.de>
> --
> GPG/PGP-Schlüssel:
> http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
>
> _______________________________________________
> Koha mailing list http://koha-community.org
> Koha at lists.katipo.co.nz <mailto:Koha at lists.katipo.co.nz>
> http://lists.katipo.co.nz/mailman/listinfo/koha
>
>
>
>
> --
> Jared Camins-Esakov
> Bibliographer, C & P Bibliography Services, LLC
> (phone) +1 (917) 727-3445
> (e-mail) jcamins at cpbibliography.com <mailto:jcamins at cpbibliography.com>
> (web) http://www.cpbibliography.com/
--
Oliver Goldschmidt
TU Hamburg-Harburg / Universitätsbibliothek / Digitale Dienste
Denickestr. 22
21071 Hamburg - Harburg
Tel. +49 (0)40 / 428 78 - 32 91
eMail o.goldschmidt at tuhh.de
--
GPG/PGP-Schlüssel:
http://www.tub.tu-harburg.de/keys/Oliver_Marahrens_pub.asc
More information about the Koha
mailing list