[Koha] best practice for indexing and re-indexing

Tue Sep 10 06:11:54 NZST 2024

What are some of the best practices for Zebra indexing and re-indexing of MARC records; ought my MARC records include unique identifiers in sone  9xx field?

I am in the process of curating about .7 million MARC records, putting them into Koha, and providing access to them via both the traditional catalogue as well as the Search-Retrieve Via URL (SRU) interfaces. I am in a constant process of improving the records in one way or another. Adding date values. Adding subject headings. Adding content notes. Removing duplicates. Etc. 

After creating an improved set of records, I have been zealously deleting bibliographic records using the command line, but this process also deletes things I don't want to be deleted. See: https://bit.ly/3XkMeKV

I know I can use bulkmarcimport.pl to delete records, but the process is very slow, especially when I want to delete 100's of thousands of items.

A few days ago I learned about the koha-rebuild-zebra command, and I believe I saw something about Zebra identifiers in 9xx fields flashing by on the screen. Maybe, if I put identifiers in a 9xx fields, I can re-index things more quickly? If so, then how?

Maybe, if my records have magic 9xx fields, then, when I use bulkmarcimport.pl to import things, Zebra will really overwrite my existing records? That would be nice.

After I create a new set of improved MARC records, how can I efficiently reindex them sans deleteing them from the MySQL database?

--
Eric Morgan