[Koha] Zebra not updating biblios automatically in koha 3.8

Doug Kingston dpk at randomnotes.org
Sun Sep 2 11:42:06 NZST 2012


Actually rebuild_zebra.pl -b -r -v does the trick.  I had broken my
rebuild_zebra with some debugging logic while tracking down this problem.
 zebraidx was not being called...  We look good now on this issue.

Solution:
1. use 'checkNonIndexedBiblios.pl -c' to get a report of missing biblios.
2. visit
http://mylibrary.org/cgi-bin/koha/catalogue/detail.pl?biblionumber=14769
(missing biblio) and then Edit, and fixes the control number to be
unique
(repeat for each biblio)
3. rebuild_zebra.pl -b -r -v

It would be good to build a tool to find duplicate control numbers.  I did
this by exporting all the biblios, using marcprint (my python utility) |
grep "=001" | sort | uniq -c | sort -r | less, and looked for counts
greater than 1.

A better approach would be to make the control number a unique field and
enforce this in the database.  Is this possible?

-Doug-

On 1 September 2012 16:11, Doug Kingston <dpk at randomnotes.org> wrote:

> So doing some further research, it definitely looks like we have duplicate
> control numbers (001).  This is a data entry mistake and it looks like the
> cataloger copied the biblios for similar entries.  I have gone back and
> altered the control numbers to be unique, but rebuild_zebra.pl -b -r is
> not adding the new entries.  Any idea what else we might need to do?
>
> -Doug-
>
>
> On 1 September 2012 15:32, Ian Bays <ian.bays at ptfs-europe.com> wrote:
>
>> Hi.
>> The 3.8 upgrade offers the dom indexing by default and if you have taken
>> that option (as seen in $KOHA_CONF) the xsl used instead of record.abs
>> (~/koha-dev/etc/zebradb/marc_**defs/marc21/biblios/biblio-**zebra-indexdefs.xsl)
>> uses a construct (z:id) for the 001 which uses that (if it exists) as the
>> zebra unique id.  This means if you have more than one bib record with the
>> same 001 (as you get if you duplicate a bib for instance) it will only
>> index the last one and it won't complain at all about it.
>> Not sure if it's a hangover from using the xml used by authorities which
>> stores the auth_id in the 001 or UNIMARC which might use 001 as the bib
>> number.  Either way I bet if you remove the 001 or make it unique then it
>> will index OK.
>> The better solution is to fix the xsl to probably not use the z:id for
>> biblios or maybe get it to use the 999$c, but the zebra config scares me.
>> It took ages to find the cause so I hope this helps someone.
>> Ian
>>
>> On 01/09/2012 18:11, Doug Kingston wrote:
>>
>>> On 1 September 2012 09:46, Jared Camins-Esakov
>>> <jcamins at cpbibliography.com>**wrote:
>>>
>>>  Doug,
>>>>
>>>> So environment variables are not the issue.  We are carefully managing
>>>>
>>>>> those.
>>>>>
>>>>>  Make sure when you are using cron jobs that you set the environment
>>>> variables IN YOUR CRONTAB. Setting environment variables elsewhere is a
>>>> recipe for confusion and misery down the road. However, this is -- as
>>>> you
>>>> say -- not the problem.
>>>>
>>>>
>>>>  I have tried using the new tool checkNonIndexedBiblios.pl (from patch
>>>>> 6566)
>>>>> and it indeed finds a few recent biblios that are not indexed.  Using
>>>>> the
>>>>> -z option to mark them for indexing followed by a manual run of
>>>>> rebuild_zebra -b -v -z did not get the biblios indexed.  I cranked up
>>>>> the
>>>>> debugging on zebraidx (by modifying rebuild_zebra.pl and using -v -v)
>>>>> and
>>>>> did not see any obvious errors in the output that would suggest why
>>>>> indexing was failing.
>>>>>
>>>>>  Did you change your bibliographic frameworks? It could be a matter of
>>>> the
>>>> biblionumber not being stored properly. The other thing to do is to
>>>> confirm
>>>> that the non-indexed biblios are *actually* getting added to the
>>>> zebraqueue
>>>> by the 6566 script. It's kind of a long shot, but it could be an issue
>>>> with
>>>> the zebraqueue table getting corrupted. I've seen this happen when the
>>>> zebraqueue table got too large, and disk space was low.
>>>>
>>>>  So I think this is working as expected.  Disk space is ample on the
>>> system
>>> in question, and the catalogue is small by most standards (about 2500
>>> biblios).  I ran rebuild_zebra.pl with the -k flag so it left the
>>> exported
>>> records and here's the tree I got.
>>>
>>> library:/tmp# ls -altR p6tjtKrrK3/
>>> p6tjtKrrK3/:
>>> total 0
>>> drwxrwxrwt 6 root root 1040 Sep  1 17:50 ..
>>> drwx------ 5 koha koha  100 Sep  1 06:36 .
>>> drwxr-xr-x 2 koha koha   60 Sep  1 06:36 upd_biblio
>>> drwxr-xr-x 2 koha koha   60 Sep  1 06:36 del_biblio
>>> drwxr-xr-x 2 koha koha   40 Sep  1 06:36 biblio
>>>
>>> p6tjtKrrK3/upd_biblio:
>>> total 16
>>> -rw-r--r-- 1 koha koha 12670 Sep  1 06:36 exported_records
>>> drwxr-xr-x 2 koha koha    60 Sep  1 06:36 .
>>> drwx------ 5 koha koha   100 Sep  1 06:36 ..
>>>
>>> p6tjtKrrK3/del_biblio:
>>> total 0
>>> drwx------ 5 koha koha 100 Sep  1 06:36 ..
>>> drwxr-xr-x 2 koha koha  60 Sep  1 06:36 .
>>> -rw-r--r-- 1 koha koha   0 Sep  1 06:36 exported_records
>>>
>>> p6tjtKrrK3/biblio:
>>> total 0
>>> drwx------ 5 koha koha 100 Sep  1 06:36 ..
>>> drwxr-xr-x 2 koha koha  40 Sep  1 06:36 .
>>>
>>> Using marcprint.py, a small python program built around pymarc package, I
>>> decoded this file and find 13 MARC records, as expected.
>>> Example:
>>> =LDR  00871nam a22002417a 4500
>>> =001  201112071555.ls
>>> =003  UkLoVW
>>> =005  20111209110116.0
>>> =008  111207t1982\\\\enkg\\\\r\\\\\**001\0\eng\d
>>> =040  \\$aUkLoVW$cUkLoVW
>>> =099  \\$aQS 40
>>> =100  1\$aSheffield, Ken$92330
>>> =245  \0$aTen country dances :$bmainly from Thompson, Wright & Wilson.
>>> =260  \\$aOxford :$b[The Author],$c1982.
>>> =300  \\$a12 p. :$bmusic ;$c30 cm.
>>> =490  1\$aFrom two barns ;$vv. 1
>>> =650  \\$9117$aCountry dances
>>> =650  \\$9127$aDance music
>>> =830  \5$aFrom two barns$92331
>>> =942  \\$2VWML$cBK$hQS 40$n0$6QS_00040
>>> =999  \\$c14879$d14879
>>> =952  \\$w2011-12-07$p10914$r2011-**12-07$40$00$6QS_00040$915083$**
>>> bVWML$10$oQS
>>> 40$d2011-12-07$70$cBOX$2VWML$**yBK$aVWML
>>> =952  \\$w2011-12-07$p11121$r2011-**12-07$40$00$6QS_00040$915084$**
>>> bVWML$10$oQS
>>> 40$d2011-12-07$71$cBOX$2VWML$**yBK$aVWML
>>>
>>> I have attached an ascii printout of all 13 records in case someone wants
>>> to look for a pattern in these records.
>>>
>>> The problem is either in the format/contents of those records, or in
>>> zebraidx/zebrasrv or their config files.  My suspicion is with the later
>>> since we have already had to fix one problem there with for bug 6566.
>>>
>>> -Doug-
>>>
>>>  Regards,
>>>> Jared
>>>>
>>>> --
>>>> Jared Camins-Esakov
>>>> Bibliographer, C & P Bibliography Services, LLC
>>>> (phone) +1 (917) 727-3445
>>>> (e-mail) jcamins at cpbibliography.com
>>>> (web) http://www.cpbibliography.com/
>>>>
>>>>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> Koha mailing list  http://koha-community.org
>>>> Koha at lists.katipo.co.nz
>>>> http://lists.katipo.co.nz/**mailman/listinfo/koha<http://lists.katipo.co.nz/mailman/listinfo/koha>
>>>>
>>>
>> --
>> Ian Bays
>> Director of Projects, PTFS Europe Limited
>> Content Management and Library Solutions
>> +44 (0) 800 756 6803 (phone)
>> +44 (0) 7774 995297 (mobile)
>> +44 (0) 800 756 6384 (fax)
>> skype: ian.bays
>> email: ian.bays at ptfs-europe.com
>>
>>
>> ______________________________**_________________
>> Koha mailing list  http://koha-community.org
>> Koha at lists.katipo.co.nz
>> http://lists.katipo.co.nz/**mailman/listinfo/koha<http://lists.katipo.co.nz/mailman/listinfo/koha>
>>
>
>


More information about the Koha mailing list