[Koha] Staging and matching authority records

Joy Nelson joy at bywatersolutions.com
Fri Dec 25 04:14:16 NZDT 2015


Hi Ian-
If you are editing the .xsl file be sure to use xsltproc to recread the
indexdefs.xml file also.  (then reindex your auths, naturally :-)  )

I suspect that you are correct that there is some stripping of spaces, but
not to the extent you are looking for.  i.e. Koha is removing 1 space, but
not multiple spaces    If this is the case, someone with some more
expertise would need to weigh in on the behind the scenes code.

If you have some perl proficiency you can use perl scripts to pull out,
modify the 010$a (stripping out spaces) from your authority records in Koha
if that would help make them match the incoming records.  But from a true
catalogers perspective you want the 010$a in your system to be the 'right
value' spaces or no spaces, so using a script to modify your 010$a's may or
may not be desirable.

joy

On Thu, Dec 24, 2015 at 7:52 AM, Ian Bays <ian.bays at ptfs-europe.com> wrote:

> Thanks Cecil and others,
>
> It's almost there but not quite.  I can get a match using index
> LC-card-number in an authority match rule as long as the contents of the
> 010$a in the authority record already in Koha has at most one space between
> parts and has no trailing space.  Just for context and comparison here are
> the vital statistics of the system:
>
> The version of Koha is 3.20.01.000 and we are MARC21 using dom indexing
> and icu chains.
> The system is a dev install on Debian.
>
> I have retried on a similar but new install on version 3.20.03.000.  Same
> result.
>
> I am tempted to conclude that the actual indexes are being loaded with the
> extra spaces (and retrieved) but am struggling to work out how to change
> that behind the scenes.
> The term provided by the staging process from the incoming record does
> seem to strip excess spaces before searching but then does not match what
> is in the indexes.
> I noticed there have been some bugzilla entries that have had to use
> "normalize-space" in the xsl stylesheet (authority-zebra-indexdefs.xsl)
> that I think is used to load the authority indexes but it does not seem to
> be in use for the 010 tag.
> Reading what that procedure does it would seem to be the way to fix the
> problem.  However my attempts to edit the xsl stylesheet have not yet been
> successful.
>
> The correct definitions for the 010$a do seem to allow for and expect
> these extra spaces in the MARC data (
> http://www.loc.gov/marc/authority/ad010.html).  So the 010$a is a bit
> weird in that it is in a subfield but within that the character positions
> are supposed to be fixed.  However I think that stripping excess spaces for
> dom indexing for this tag would solve this matching problem without
> introducing any other problems.
>
> If we can get that resolved I would appreciate hearing from anyone using
> (say) Marcive authority updates to then update the controlled fields in the
> bib records.
>
> Sorry if this is a bit heavy for the holiday season, and thanks again for
> the pointers.
>
> Ian
>
> On 22/12/2015 21:09, Hillyard, Cecil wrote:
>
>> Our authority records  have spaces, or not, depending on the type
>>
>>
>> Agatha Christie: (personal name)
>> 010 ## - LIBRARY OF CONGRESS CONTROL NUMBER
>>    a LC control number n 79038407
>>    z Canceled/invalid LC sh 85025298
>>
>> 1 space after the n.
>>
>> Eagles of Death Metal (Corporate name)
>> 010 ## - LIBRARY OF CONGRESS CONTROL NUMBER
>>    a LC control number no2005113289
>>
>>
>> __________________
>> Cecil Hillyard
>> Washoe County Library
>> Technical Services
>> 775-327-8338
>>
>> -----Original Message-----
>> From: Koha [mailto:koha-bounces at lists.katipo.co.nz] On Behalf Of Joy
>> Nelson
>> Sent: Tuesday, December 22, 2015 11:03 AM
>> To: Ian Bays
>> Cc: Koha
>> Subject: Re: [Koha] Staging and matching authority records
>>
>> Ian-
>> I normally do not see spaces in the LC number in authorities.
>>
>> Thanks
>> Joy
>>
>> On Tue, Dec 22, 2015 at 11:26 AM, Ian Bays <ian.bays at ptfs-europe.com>
>> wrote:
>>
>> Thanks everyone.
>>>
>>> Encouraged by your successes I was not getting a match on the 010$a
>>> but noticed the contents had at least one space in it.
>>> I set up a test record without any spaces and found it is matching now.
>>>
>>> So it may be the index definitions for the 010$a in authorities that
>>> needs to allow for phrase or it may be that the 010$a should always
>>> have no spaces (or other white space).
>>>
>>> For those that are using the 010$a can anyone say if they know if any
>>> of their 010$a have space(s) in them?  The example I had been using was:
>>>
>>> "n  81032458"
>>> That is two spaces after the "n" and one at the end.
>>>
>>> Many thanks.  It may be we need to change the contents of the 010$a to
>>> exclude spaces and on the import too.
>>>
>>> Thanks again.
>>> Ian
>>> On 22/12/2015 17:19, Hillyard, Cecil wrote:
>>>
>>> We use this a lot to download batches of authority records from
>>>> SkyRiver
>>>>
>>>> In Record Matching rules:
>>>>
>>>> Match points: Match point 1
>>>>          Search index: LC-card-number
>>>>          Score:  1000
>>>> Matchpoint components
>>>>          Tag: 010
>>>>          Subfield:s a
>>>>          Offset: 0
>>>>          Length: 0
>>>>
>>>> You just have to make sure that your current records do have 010s to
>>>> match on.
>>>>
>>>> __________________
>>>> Cecil Hillyard
>>>> Washoe County Library
>>>> Techinical Services
>>>> 775-327-8338
>>>>
>>>> -----Original Message-----
>>>> From: Koha [mailto:koha-bounces at lists.katipo.co.nz] On Behalf Of
>>>> Tomas Cohen Arazi
>>>> Sent: Tuesday, December 22, 2015 8:50 AM
>>>> To: Christopher Davis
>>>> Cc: Koha
>>>> Subject: Re: [Koha] Staging and matching authority records
>>>>
>>>> Chris, we had trouble getting a matching rule for authorities. We
>>>> ended up using the 001 field I recall. It makes use of the Zebra
>>>> indexes, so it is important that you put the right one.
>>>>
>>>> 2015-12-22 13:40 GMT-03:00 Christopher Davis <cgdavis at uintah.utah.gov>:
>>>>
>>>> Ian,
>>>>
>>>>> Joy's correct, while MARC authority records can have ISBNs, they
>>>>> typically do not. Sorry to lead you astray :-$
>>>>>
>>>>> --
>>>>>
>>>>> Christopher Davis, MLS
>>>>> Systems & E-Services Librarian
>>>>> Uintah County Library
>>>>> cgdavis at uintah.utah.gov
>>>>> (435) 789-0091 ext.261
>>>>> uintahlibrary.org
>>>>> basinlibraries.org
>>>>> facebook.com/uintahcountylibrary
>>>>>
>>>>>
>>>>> On Tue, Dec 22, 2015 at 9:29 AM, Joy Nelson
>>>>> <joy at bywatersolutions.com>
>>>>> wrote:
>>>>>
>>>>> Ian-
>>>>>> If you have the matching rule set up to match on the 010 subfield a
>>>>>> using the LC-cardnumber index, then the matching should happen if 1.
>>>>>> your authority records are indexed 2. you have 010$a in your
>>>>>> existing records AND in the incoming records
>>>>>>
>>>>>> (and
>>>>>
>>>>> matches do exist).
>>>>>>
>>>>>> (I'm assuming here that you've also specified the incoming file is
>>>>>> an authority file, not a bibliographic file. )
>>>>>>
>>>>>> I'm not sure that using a 020 will assist you in matching as that
>>>>>> is a bibliographic tag, not an authority tag.  And I do not believe
>>>>>> the
>>>>>> 020 is indexed for authority records.  A tag must be indexed in
>>>>>> order to be
>>>>>>
>>>>>> used as
>>>>>
>>>>> a match point.
>>>>>>
>>>>>> When troubleshooting, I generally find a record I know should match
>>>>>> and
>>>>>>
>>>>>> then
>>>>>
>>>>> look at the 010$a in the incoming file and existing record to see
>>>>>> if
>>>>>>
>>>>>> there
>>>>>
>>>>> is anything that would potentially cause the match to fail.
>>>>>>
>>>>>> -Joy
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 22, 2015 at 9:22 AM, Christopher Davis <
>>>>>>
>>>>>> cgdavis at uintah.utah.gov>
>>>>>
>>>>> wrote:
>>>>>>
>>>>>> Ian,
>>>>>>>
>>>>>>> Off the top of my head, I thought that you might want to tried
>>>>>>> adding MARC authority field 020 (ISBN) as a match point to your
>>>>>>> matching point rule? Not every MARC authority record will have a
>>>>>>> LCCN (although most do).
>>>>>>>
>>>>>>> Good luck and Merry Christmas,
>>>>>>>
>>>>>>> Christopher Davis, MLS
>>>>>>> Systems & E-Services Librarian
>>>>>>> Uintah County Library
>>>>>>> cgdavis at uintah.utah.gov
>>>>>>> (435) 789-0091 ext.261
>>>>>>> uintahlibrary.org
>>>>>>> basinlibraries.org
>>>>>>> facebook.com/uintahcountylibrary
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 22, 2015 at 9:06 AM, Ian Bays
>>>>>>> <ian.bays at ptfs-europe.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi.
>>>>>>>>
>>>>>>>> We are trying to stage some authority records to match existing
>>>>>>>> authority records to overlay them.
>>>>>>>> We have used the matching for bibliographic records successfully
>>>>>>>>
>>>>>>>> before
>>>>>>>
>>>>>> but
>>>>>>
>>>>>>> are having difficulty with matching authority records.
>>>>>>>> The version of Koha is 3.20.01.000 and we are MARC21 using dom
>>>>>>>>
>>>>>>>> indexing
>>>>>>>
>>>>>> and
>>>>>>
>>>>>>> icu chains.
>>>>>>>>
>>>>>>>> Before getting into the details, can I ask if anyone has any
>>>>>>>>
>>>>>>>> experience
>>>>>>>
>>>>>> of
>>>>>>
>>>>>>> matching authorities (staging and managing) with any success?
>>>>>>>>
>>>>>>>> We are trying to match on the 010$a and looking at the matching
>>>>>>>> rules documentation in:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://manual.koha-community.org/3.20/en/catadmin.html#recordmatc
>>>>>>>> hingr
>>>>>>>>
>>>>>>> ules
>>>>>
>>>>> Thanks in advance.
>>>>>>
>>>>>>> Ian
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ian Bays
>>>>>>>> Director of Projects, PTFS Europe Limited Content Management and
>>>>>>>> Library Solutions
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Koha mailing list  http://koha-community.org
>>>>>>>> Koha at lists.katipo.co.nz
>>>>>>>> https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>> Koha mailing list  http://koha-community.org
>>>>>>> Koha at lists.katipo.co.nz
>>>>>>> https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Joy Nelson
>>>>>> Director of Migrations
>>>>>>
>>>>>> ByWater Solutions
>>>>>> Support and Consulting for Open Source Software
>>>>>> Office: Fort Worth, TX
>>>>>> Phone/Fax (888)900-8944
>>>>>> What is Koha?
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>> Koha mailing list  http://koha-community.org Koha at lists.katipo.co.nz
>>>>> https://lists.katipo.co.nz/mailman/listinfo/koha
>>>>>
>>>>>
>>>>>
>>>> --
>>> Ian Bays
>>> Director of Projects, PTFS Europe Limited Content Management and
>>> Library Solutions
>>>
>>> _______________________________________________
>>> Koha mailing list  http://koha-community.org Koha at lists.katipo.co.nz
>>> https://lists.katipo.co.nz/mailman/listinfo/koha
>>>
>>>
>>
>>
>
> --
> Ian Bays
> Director of Projects, PTFS Europe Limited
> Content Management and Library Solutions
>
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha
>



-- 
Joy Nelson
Director of Migrations

ByWater Solutions <http://bywatersolutions.com>
Support and Consulting for Open Source Software
Office: Fort Worth, TX
Phone/Fax (888)900-8944
What is Koha? <http://bywatersolutions.com/what-is-koha/>


More information about the Koha mailing list