[Koha] best practice for indexing and re-indexing

Sat Sep 14 05:55:22 NZST 2024

Hi Eric,

the search with Zebra and Elasticsearch only works when your records
include a unique identifier that links the record in the index with the
record in your database. This is achieved by adding the biblionumber to
the MARC record automatically. For MARC21 field 999 is used. These
fields and mappings should not be changed.

If you import records, the biblionumber will automatically be added. If
you want to carry over an identifier of your old system to Koha, in
MARC21 you could use 035$a with a prefix or 001/003.

You can't speed up the indexing process by adding anything to your MARC
data.

In general, indexing using Elasticsearch will be much quicker than using
Zebra for this number of records.

You can always do another full reindex, without deleting. But if you
load new improved records, you will need to reindex them again.

Hope that helps,

Katrin

On 09.09.24 20:11, Eric Lease Morgan wrote:
> What are some of the best practices for Zebra indexing and re-indexing of MARC records; ought my MARC records include unique identifiers in sone  9xx field?
>
> I am in the process of curating about .7 million MARC records, putting them into Koha, and providing access to them via both the traditional catalogue as well as the Search-Retrieve Via URL (SRU) interfaces. I am in a constant process of improving the records in one way or another. Adding date values. Adding subject headings. Adding content notes. Removing duplicates. Etc.
>
> After creating an improved set of records, I have been zealously deleting bibliographic records using the command line, but this process also deletes things I don't want to be deleted. See: https://bit.ly/3XkMeKV
>
> I know I can use bulkmarcimport.pl to delete records, but the process is very slow, especially when I want to delete 100's of thousands of items.
>
> A few days ago I learned about the koha-rebuild-zebra command, and I believe I saw something about Zebra identifiers in 9xx fields flashing by on the screen. Maybe, if I put identifiers in a 9xx fields, I can re-index things more quickly? If so, then how?
>
> Maybe, if my records have magic 9xx fields, then, when I use bulkmarcimport.pl to import things, Zebra will really overwrite my existing records? That would be nice.
>
> After I create a new set of improved MARC records, how can I efficiently reindex them sans deleteing them from the MySQL database?
>
> --
> Eric Morgan
>
>
> _______________________________________________
>
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha