Actually rebuild_zebra.pl -b -r -v does the trick. I had broken my rebuild_zebra with some debugging logic while tracking down this problem. zebraidx was not being called... We look good now on this issue. Solution: 1. use 'checkNonIndexedBiblios.pl -c' to get a report of missing biblios. 2. visit http://mylibrary.org/cgi-bin/koha/catalogue/detail.pl?biblionumber=14769 (missing biblio) and then Edit, and fixes the control number to be unique (repeat for each biblio) 3. rebuild_zebra.pl -b -r -v It would be good to build a tool to find duplicate control numbers. I did this by exporting all the biblios, using marcprint (my python utility) | grep "=001" | sort | uniq -c | sort -r | less, and looked for counts greater than 1. A better approach would be to make the control number a unique field and enforce this in the database. Is this possible? -Doug- On 1 September 2012 16:11, Doug Kingston <dpk@randomnotes.org> wrote:
So doing some further research, it definitely looks like we have duplicate control numbers (001). This is a data entry mistake and it looks like the cataloger copied the biblios for similar entries. I have gone back and altered the control numbers to be unique, but rebuild_zebra.pl -b -r is not adding the new entries. Any idea what else we might need to do?
-Doug-
On 1 September 2012 15:32, Ian Bays <ian.bays@ptfs-europe.com> wrote:
Hi. The 3.8 upgrade offers the dom indexing by default and if you have taken that option (as seen in $KOHA_CONF) the xsl used instead of record.abs (~/koha-dev/etc/zebradb/marc_**defs/marc21/biblios/biblio-**zebra-indexdefs.xsl) uses a construct (z:id) for the 001 which uses that (if it exists) as the zebra unique id. This means if you have more than one bib record with the same 001 (as you get if you duplicate a bib for instance) it will only index the last one and it won't complain at all about it. Not sure if it's a hangover from using the xml used by authorities which stores the auth_id in the 001 or UNIMARC which might use 001 as the bib number. Either way I bet if you remove the 001 or make it unique then it will index OK. The better solution is to fix the xsl to probably not use the z:id for biblios or maybe get it to use the 999$c, but the zebra config scares me. It took ages to find the cause so I hope this helps someone. Ian
On 01/09/2012 18:11, Doug Kingston wrote:
On 1 September 2012 09:46, Jared Camins-Esakov <jcamins@cpbibliography.com>**wrote:
Doug,
So environment variables are not the issue. We are carefully managing
those.
Make sure when you are using cron jobs that you set the environment variables IN YOUR CRONTAB. Setting environment variables elsewhere is a recipe for confusion and misery down the road. However, this is -- as you say -- not the problem.
6566) and it indeed finds a few recent biblios that are not indexed. Using the -z option to mark them for indexing followed by a manual run of rebuild_zebra -b -v -z did not get the biblios indexed. I cranked up the debugging on zebraidx (by modifying rebuild_zebra.pl and using -v -v) and did not see any obvious errors in the output that would suggest why indexing was failing.
Did you change your bibliographic frameworks? It could be a matter of
I have tried using the new tool checkNonIndexedBiblios.pl (from patch the biblionumber not being stored properly. The other thing to do is to confirm that the non-indexed biblios are *actually* getting added to the zebraqueue by the 6566 script. It's kind of a long shot, but it could be an issue with the zebraqueue table getting corrupted. I've seen this happen when the zebraqueue table got too large, and disk space was low.
So I think this is working as expected. Disk space is ample on the
system in question, and the catalogue is small by most standards (about 2500 biblios). I ran rebuild_zebra.pl with the -k flag so it left the exported records and here's the tree I got.
library:/tmp# ls -altR p6tjtKrrK3/ p6tjtKrrK3/: total 0 drwxrwxrwt 6 root root 1040 Sep 1 17:50 .. drwx------ 5 koha koha 100 Sep 1 06:36 . drwxr-xr-x 2 koha koha 60 Sep 1 06:36 upd_biblio drwxr-xr-x 2 koha koha 60 Sep 1 06:36 del_biblio drwxr-xr-x 2 koha koha 40 Sep 1 06:36 biblio
p6tjtKrrK3/upd_biblio: total 16 -rw-r--r-- 1 koha koha 12670 Sep 1 06:36 exported_records drwxr-xr-x 2 koha koha 60 Sep 1 06:36 . drwx------ 5 koha koha 100 Sep 1 06:36 ..
p6tjtKrrK3/del_biblio: total 0 drwx------ 5 koha koha 100 Sep 1 06:36 .. drwxr-xr-x 2 koha koha 60 Sep 1 06:36 . -rw-r--r-- 1 koha koha 0 Sep 1 06:36 exported_records
p6tjtKrrK3/biblio: total 0 drwx------ 5 koha koha 100 Sep 1 06:36 .. drwxr-xr-x 2 koha koha 40 Sep 1 06:36 .
Using marcprint.py, a small python program built around pymarc package, I decoded this file and find 13 MARC records, as expected. Example: =LDR 00871nam a22002417a 4500 =001 201112071555.ls =003 UkLoVW =005 20111209110116.0 =008 111207t1982\\\\enkg\\\\r\\\\\**001\0\eng\d =040 \\$aUkLoVW$cUkLoVW =099 \\$aQS 40 =100 1\$aSheffield, Ken$92330 =245 \0$aTen country dances :$bmainly from Thompson, Wright & Wilson. =260 \\$aOxford :$b[The Author],$c1982. =300 \\$a12 p. :$bmusic ;$c30 cm. =490 1\$aFrom two barns ;$vv. 1 =650 \\$9117$aCountry dances =650 \\$9127$aDance music =830 \5$aFrom two barns$92331 =942 \\$2VWML$cBK$hQS 40$n0$6QS_00040 =999 \\$c14879$d14879 =952 \\$w2011-12-07$p10914$r2011-**12-07$40$00$6QS_00040$915083$** bVWML$10$oQS 40$d2011-12-07$70$cBOX$2VWML$**yBK$aVWML =952 \\$w2011-12-07$p11121$r2011-**12-07$40$00$6QS_00040$915084$** bVWML$10$oQS 40$d2011-12-07$71$cBOX$2VWML$**yBK$aVWML
I have attached an ascii printout of all 13 records in case someone wants to look for a pattern in these records.
The problem is either in the format/contents of those records, or in zebraidx/zebrasrv or their config files. My suspicion is with the later since we have already had to fix one problem there with for bug 6566.
-Doug-
Regards,
Jared
-- Jared Camins-Esakov Bibliographer, C & P Bibliography Services, LLC (phone) +1 (917) 727-3445 (e-mail) jcamins@cpbibliography.com (web) http://www.cpbibliography.com/
______________________________**_________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/**mailman/listinfo/koha<http://lists.katipo.co.nz/mailman/listinfo/koha>
-- Ian Bays Director of Projects, PTFS Europe Limited Content Management and Library Solutions +44 (0) 800 756 6803 (phone) +44 (0) 7774 995297 (mobile) +44 (0) 800 756 6384 (fax) skype: ian.bays email: ian.bays@ptfs-europe.com
______________________________**_________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/**mailman/listinfo/koha<http://lists.katipo.co.nz/mailman/listinfo/koha>