Hi All, I would like BiblioAddsAuthorities to work even for bulk imports. It does not work when records are staged in Koha, nor when bulkmarcimport.pl is used. It does work when one edits and saves a marc record and global preference BiblioAddsAuthorities is set to "on". I found this same question posted to koha-dev in 2009 (with no response): http://koha.1045719.n5.nabble.com/Bulkmarcimport-and-authorities-tp3065080p3... I also found an IRC conversation from 2008 on this topic (with no resolution): http://stats.workbuffer.org/irclog/text.pl?channel=koha;date=2008-01-22 What I have been able to gather is that the relevant code is contained in BiblioAddAuthorities in cataloguing/addbiblio.pl. # # sub that tries to find authorities linked to the biblio # the sub : # - search in the authority DB for the same authid (in $9 of the biblio) # - search in the authority DB for the same 001 (in $3 of the biblio in UNIMARC) # - search in the authority DB for the same values (exactly) (in all subfields of the biblio) # if the authority is found, the biblio is modified accordingly to be connected to the authority. # if the authority is not found, it's added, and the biblio is then modified to be connected to the authority. # This is what I want, but in bulk. Is there already a script that calls BiblioAddAuthorities and processes all biblio records in bulk? If not, can someone take a moment and give me any advice about how one would write such a script? I am new to Koha. An example script would be great. We want to be able to manage our own Authorities for at least Authors in house. Any help would be greatly appreciated. Thanks, Pete. Volunteer with The Archives and Collections Society http://aandc.org/ -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p33780... Sent from the Koha - Discuss mailing list archive at Nabble.com.
What format (if any) are your authorities currently in? If you can bulk import your authorities, you will be able to link to them. We had to build ours from scratch before import. To create our authorities records, we stripped the names out of the appropriate fields in the bib records, fed them into a database, de-duplicated and checked inconsistencies, adding notes and references fields as needed. Then we converted them to MARC. This produced a separate authority file that we uploaded and then we ran link_bibs_to_authorities.pl If you already have them in electronic format, you can skip several steps in the process. This doesn't really answer your question about how to have Koha create new authorities with a bulk import. Just how we worked around the problem. Elaine On Wed, Feb 9, 2011 at 5:55 PM, Peter Huerter <pete.huerter@gmail.com> wrote:
Hi All,
I would like BiblioAddsAuthorities to work even for bulk imports. It does not work when records are staged in Koha, nor when bulkmarcimport.pl is used. It does work when one edits and saves a marc record and global preference BiblioAddsAuthorities is set to "on".
I found this same question posted to koha-dev in 2009 (with no response): http://koha.1045719.n5.nabble.com/Bulkmarcimport-and-authorities-tp3065080p3...
I also found an IRC conversation from 2008 on this topic (with no resolution): http://stats.workbuffer.org/irclog/text.pl?channel=koha;date=2008-01-22
What I have been able to gather is that the relevant code is contained in BiblioAddAuthorities in cataloguing/addbiblio.pl.
# # sub that tries to find authorities linked to the biblio # the sub : # - search in the authority DB for the same authid (in $9 of the biblio) # - search in the authority DB for the same 001 (in $3 of the biblio in UNIMARC) # - search in the authority DB for the same values (exactly) (in all subfields of the biblio) # if the authority is found, the biblio is modified accordingly to be connected to the authority. # if the authority is not found, it's added, and the biblio is then modified to be connected to the authority. #
This is what I want, but in bulk.
Is there already a script that calls BiblioAddAuthorities and processes all biblio records in bulk?
If not, can someone take a moment and give me any advice about how one would write such a script? I am new to Koha. An example script would be great.
We want to be able to manage our own Authorities for at least Authors in house. Any help would be greatly appreciated.
Thanks, Pete.
Volunteer with The Archives and Collections Society http://aandc.org/
-- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p33780... Sent from the Koha - Discuss mailing list archive at Nabble.com. _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 ext 36 -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
Thanks Elaine. I am in the process of trying your method. Have you had any problems using link_bibs_to_authorities.pl? It seems to have received some mixed reviews and my initial experiment failed to link the new authorities to existing biblios. E.g. http://koha.1045719.n5.nabble.com/koha-bibs-authorities-problem-tp3050690p30... "If you want a best result, you need to re-implement link_bibs_to_authorities.pl with a better matching algorithm" Did you have to re-implement a better matching algorithm? Thanks, Pete. (btw I tried to write a script to do what I wanted in my original post and found that it is not as easy as I thought. I found that I could not rely on SimpleSearch (and zebra) to find a recently added Authority. I decided to move on after many failed attempts at trying to fix this). -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p33916... Sent from the Koha - Discuss mailing list archive at Nabble.com.
It worked for us after a couple problems were fixed with our data. The link occurs in the subfield 9 (MARC 21) of the appropriate field in the biblio. When the field is linked to the authority record, the number in the authority record's 001 field will appear in $9 (but you will probably only see the number in editing mode). If your linking doesn't work, check the MARC framework to make sure you've got a $9 in that field. We had a problem with linking the first couple tries. The problem was in the 008 field in the authority records. Position 14 and 15 must be set to 'a' for the linking to work. Also, Koha will automatically create the 001 number in the authority records. It will overwrite anything you have in that field, though my IT wizard (whom I've CC'd just in case he's got any other tricks up his sleeve) managed to force it to accept our last import with the original numbers. Hope this helps, I'm the librarian half of the team, so the serious IT stuff is occasionally lost on me. Elaine On Fri, Feb 18, 2011 at 7:09 PM, Peter Huerter <pete.huerter@gmail.com> wrote:
Thanks Elaine. I am in the process of trying your method.
Have you had any problems using link_bibs_to_authorities.pl? It seems to have received some mixed reviews and my initial experiment failed to link the new authorities to existing biblios.
E.g. http://koha.1045719.n5.nabble.com/koha-bibs-authorities-problem-tp3050690p30...
"If you want a best result, you need to re-implement link_bibs_to_authorities.pl with a better matching algorithm"
Did you have to re-implement a better matching algorithm?
Thanks, Pete.
(btw I tried to write a script to do what I wanted in my original post and found that it is not as easy as I thought. I found that I could not rely on SimpleSearch (and zebra) to find a recently added Authority. I decided to move on after many failed attempts at trying to fix this).
-- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p33916... Sent from the Koha - Discuss mailing list archive at Nabble.com. _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 ext 36 -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)
Thanks Elaine. I'm still having trouble getting my auth records to link with my biblio records. Our process is this: 1) Import biblios into koha (without any authority information). 2) Take a dedup't list of authors in a CSV file, and run it through MarcEdit to create a compiled .mrc file of authorities. (I also add 040 info here, and define my own LDR, and 008 but nothing else) .csv file: Zumwalt, Elmo R. Jr Zimmerman, Linda .mrk file: =LDR 00000nz a2200000o 4500 =008 110225000000|||a||||aa|||||||||||||||||d|||||| =040 \\$aOPIACS$bENG$cOPIACS =100 \\$aZumwalt, Elmo R. Jr =LDR 00000nz a2200000o 4500 =008 110225000000|||a||||aa|||||||||||||||||d|||||| =040 \\$aOPIACS$bENG$cOPIACS =100 \\$aZimmerman, Linda 3) Import the compiled authority list into Koha using bulkmarcimport, and re-index Zebra. perl bulkmarcimport.pl -a -file /home/paul/first_authorities2.mrc -match=pn,100a -v 2 perl rebuild_zebra.pl -a -r -v 4) Run the linking script. perl /usr/share/koha/bin/link_bibs_to_authorities.pl --verbose The authority records are successfully imported into Koha (I can do an authority search and find them), but each search result report that 0 biblios are linked with "this" authority. The linking script appears to do the opposite of what we want. It actually removes authority links that are already present (I added an authority before running the linking script. I'm able to add the link by editing a given biblio in Koha. Koha adds it automatically - See my original post on this thread). Existing links are gone (reporting "0 biblios"), and no new links are added. I am using the following LDR, and 008 fields: LDR: 00000nz a2200000o 4500 008: 000000|||a||||aa|||||||||||||||||d|||||| (I'm sure there are bibs to be linked with these authorities.) Any ideas? Do you use MarcEdit? Would you be able to provide a sample MARC record for one of your authorities please? Thanks again, Pete. -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34034... Sent from the Koha - Discuss mailing list archive at Nabble.com.
I have similar problems (Koha 323 on Debian 6). In my opinion, Pete, your auth records are good, since they are imported and indexed correctly. The fault seems to be in link_bibs_to_authorities.pl. Debugging it, I think that C4/Heading.pm adds too much query limiters when trying to find an authority entry starting from values stored in biblio fields. For instance, in the case of authors, the limiter "AND Heading-use-main-or-added-entry=a" introduces the error. So I commented out lines 142-152 of C4/Heading.pm and every author was correctly linked to its own authority. Up to now I don't know it this is a known bug or if Koha 325 solves it. HTH. Stefano On Feb 28, 2011, at 16:52 , Peter Huerter wrote:
Thanks Elaine. I'm still having trouble getting my auth records to link with my biblio records.
Our process is this: 1) Import biblios into koha (without any authority information).
2) Take a dedup't list of authors in a CSV file, and run it through MarcEdit to create a compiled .mrc file of authorities. (I also add 040 info here, and define my own LDR, and 008 but nothing else)
.csv file: Zumwalt, Elmo R. Jr Zimmerman, Linda
.mrk file: =LDR 00000nz a2200000o 4500 =008 110225000000|||a||||aa|||||||||||||||||d|||||| =040 \\$aOPIACS$bENG$cOPIACS =100 \\$aZumwalt, Elmo R. Jr
=LDR 00000nz a2200000o 4500 =008 110225000000|||a||||aa|||||||||||||||||d|||||| =040 \\$aOPIACS$bENG$cOPIACS =100 \\$aZimmerman, Linda
3) Import the compiled authority list into Koha using bulkmarcimport, and re-index Zebra. perl bulkmarcimport.pl -a -file /home/paul/first_authorities2.mrc -match=pn,100a -v 2 perl rebuild_zebra.pl -a -r -v
4) Run the linking script. perl /usr/share/koha/bin/link_bibs_to_authorities.pl --verbose
The authority records are successfully imported into Koha (I can do an authority search and find them), but each search result report that 0 biblios are linked with "this" authority.
The linking script appears to do the opposite of what we want. It actually removes authority links that are already present (I added an authority before running the linking script. I'm able to add the link by editing a given biblio in Koha. Koha adds it automatically - See my original post on this thread). Existing links are gone (reporting "0 biblios"), and no new links are added.
I am using the following LDR, and 008 fields:
LDR: 00000nz a2200000o 4500
008: 000000|||a||||aa|||||||||||||||||d||||||
(I'm sure there are bibs to be linked with these authorities.)
Any ideas? Do you use MarcEdit? Would you be able to provide a sample MARC record for one of your authorities please?
Thanks again, Pete.
-- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34034... Sent from the Koha - Discuss mailing list archive at Nabble.com. _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Thanks Stefano. Unfortunately it's not working for me - yet :) In your post you describe commenting out lines 142-152 of C4/Heading.pm. In my Heading.pm file this would also comment out the return statement of _query_limiters. Is your _query_limiters effectively reduced to the following? sub _query_limiters { my $self = shift; my $limiters = " AND at='$self->{'auth_type'}'"; return $limiters; } When I modify _query_limiters as above and rerun link_bibs_to_authorities.pl the linking still does not occur. Are you able to show me some raw MARC showing an authority linked with a biblio? I understand basically that _9 subfields are added.., but what I am wondering is if my MARC format is somehow missing some important info (or at least our delta may provide some clues). E.g. I am trying to link the following biblio with the following authority using link_bibs_to_authorities.pl: LDR 00572nam a2200193Ia 4500 003 OPIACS 005 20110225150523.0 008 110224t20001998xx 000 0 und d 020 _a0964513331 040 _cOPIACS 100 1 _aZimmerman, Linda 245 10 _aGhosts of Rockland County _cZimmerman, Linda 250 _a2000 260 _bSpirited Books _c1998 _g2000 300 _3pamphlet 650 7 _ashanties _2OPIACS 942 _2ddc _cAU 952 _40 _xpamphlet, fine _esw _00 _912234 _bOPIACS _10 _d2011-02-24 _zindian rock, _8shanties _71 _cshanties _g1.00 _yBK _aOPIACS 999 _c12223 _d12223 000 - LEADER @ 00229nz##a2200097o##4500 003 - CONTROL NUMBER IDENTIFIER @ OPIACS 005 - DATE AND TIME OF LATEST TRANSACTION @ 20110301130115.0 008 - FIXED-LENGTH DATA ELEMENTS @ 110301000000|ge|dz||aaan|||||||||||||||c|||||d 040 ## - CATALOGING SOURCE a Original cataloging OPIACS b Language of catalogi eng c Transcribing agency OPIACS 100 ## - HEADING--PERSONAL NAME a Personal name Zimmerman, Linda Both of these records are in my database and are found in Koha. A more general question. You mention "debugging Koha". How do you go about debugging Koha? I've tried putting in debugging print statements printing to koha-error_log however I run into buffering issues(?) pretty quick and the output is not reliably.. outputted. Thanks, Pete. Volunteer with the ACS http://aandc.org/ The guy the "tired old sys admin" leans on :) -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34055... Sent from the Koha - Discuss mailing list archive at Nabble.com.
On Mar 1, 2011, at 19:54 , Peter Huerter wrote:
Thanks Stefano.
Unfortunately it's not working for me - yet :)
In your post you describe commenting out lines 142-152 of C4/Heading.pm. In my Heading.pm file this would also comment out the return statement of _query_limiters. Is your _query_limiters effectively reduced to the following?
sub _query_limiters { my $self = shift; my $limiters = " AND at='$self->{'auth_type'}'"; return $limiters; }
Yes.
When I modify _query_limiters as above and rerun link_bibs_to_authorities.pl the linking still does not occur.
Are you able to show me some raw MARC showing an authority linked with a biblio? I understand basically that _9 subfields are added.., but what I am wondering is if my MARC format is somehow missing some important info (or at least our delta may provide some clues).
E.g. I am trying to link the following biblio with the following authority using link_bibs_to_authorities.pl:
LDR 00572nam a2200193Ia 4500 003 OPIACS 005 20110225150523.0 008 110224t20001998xx 000 0 und d 020 _a0964513331 040 _cOPIACS 100 1 _aZimmerman, Linda
link_bibs should add $9 to this tag, copying the 001 tag of the auth rec
245 10 _aGhosts of Rockland County _cZimmerman, Linda 250 _a2000 260 _bSpirited Books _c1998 _g2000 300 _3pamphlet 650 7 _ashanties _2OPIACS 942 _2ddc _cAU 952 _40 _xpamphlet, fine _esw _00 _912234 _bOPIACS _10 _d2011-02-24 _zindian rock, _8shanties _71 _cshanties _g1.00 _yBK _aOPIACS 999 _c12223 _d12223
000 - LEADER @ 00229nz##a2200097o##4500 003 - CONTROL NUMBER IDENTIFIER @ OPIACS 005 - DATE AND TIME OF LATEST TRANSACTION @ 20110301130115.0 008 - FIXED-LENGTH DATA ELEMENTS @ 110301000000|ge|dz||aaan|||||||||||||||c|||||d 040 ## - CATALOGING SOURCE a Original cataloging OPIACS b Language of catalogi eng c Transcribing agency OPIACS 100 ## - HEADING--PERSONAL NAME a Personal name Zimmerman, Linda
Both of these records are in my database and are found in Koha.
A more general question. You mention "debugging Koha". How do you go about debugging Koha?
Well, I'm able to debug a perl script at a time. In this case I used perl -d link_bibs_to_authority and I followed its work step by step down to C4::Heading::_query_limiters where the query string is built. Sorry: I don't understand why your attempt with the patched _query_limiters doesn't work... Of course I assume you can find the auth rec searching for Perso_name Zimmerman, Linda, and that the result is 1. HTH anyway. Stefano
Grazie Stefano. [Off list you wrote: "Peter, please send me the marcxml field of your auth_header table relative to Zimmerman, Linda. It can be more useful than the Koha display you attached at the bottom. Thanks. Stefano"] Here it is. I think Koha adds the 003, and 942 fields. I add the rest in my mapping. | 8 | PERSO_NAME | 2011-03-01 | NULL | NULL | NULL | 00229nz##a2200097o##4500003000700000005001700007008004700024040002400071100002100095942001500116 OPIACS20110301130115.0110301000000|ge|dz||aaan|||||||||||||||c|||||d aOPIACSbengcOPIACS aZimmerman, Linda aPERSO_NAME | NULL | <?xml version="1.0" encoding="UTF-8"?> 00229nz##a2200097o##4500 OPIACS 20110301130115.0 110301000000|ge|dz||aaan|||||||||||||||c|||||d OPIACS eng OPIACS Zimmerman, Linda PERSO_NAME Thanks. I'll keep trying. Cheers, Pete. -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34080... Sent from the Koha - Discuss mailing list archive at Nabble.com.
The XML in the last post is being automatically processed away, so here it is in a text file attachment: http://koha.1045719.n5.nabble.com/file/n3408083/mymarc.txt mymarc.txt Cheers, Pete. -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34080... Sent from the Koha - Discuss mailing list archive at Nabble.com.
The main difference between your marcxml auth record and my records is the absence of the 001 controlfield. In my opinion, this make impossibile to link your auth record with biblio ones. Here I compare one of my records with yours (I used a trick to avoid stripping out of xml tags using ≤...≥ delimiters). My record was constructed by a tool I prepared and than imported with bulkmarcimport.pl in a Koha 323. To explain the absence of the 001, maybe we need to know which Koha version you are working on. Regards. Stefano ≤?xml version="1.0" encoding="UTF-8"?≥ ≤record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"≥ ≤leader≥00243nz a2200121n 4500≤/leader≥ ≤controlfield tag="001"≥65536≤/controlfield≥ ≤controlfield tag="003"≥USC≤/controlfield≥ ≤controlfield tag="005"≥20101109120000.0≤/controlfield≥ ≤controlfield tag="008"≥101109|| aca||aabn | a|a d≤/controlfield≥ ≤datafield tag="040" ind1=" " ind2=" "≥ ≤subfield code="a"≥USC≤/subfield≥ ≤/datafield≥ ≤datafield tag="100" ind1="1" ind2=" "≥ ≤subfield code="a"≥Sacchetta, Sergio.≤/subfield≥ ≤/datafield≥ ≤datafield tag="942" ind1=" " ind2=" "≥ ≤subfield code="a"≥PERSO_NAME≤/subfield≥ ≤/datafield≥ ≤datafield tag="999" ind1=" " ind2=" "≥ ≤subfield code="c"≥≤/subfield≥ ≤subfield code="d"≥≤/subfield≥ ≤/datafield≥ ≤/record≥ ≤record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd" xmlns="http://www.loc.gov/MARC21/slim"≥ ≤leader≥00229nz##a2200097o##4500≤/leader≥ ≤controlfield tag="003"≥OPIACS≤/controlfield≥ ≤controlfield tag="005"≥20110301130115.0≤/controlfield≥ ≤controlfield tag="008"≥110301000000|ge|dz||aaan|||||||||||||||c|||||d≤/controlfield≥ ≤datafield tag="040" ind1=" " ind2=" "≥ ≤subfield code="a"≥OPIACS≤/subfield≥ ≤subfield code="b"≥eng≤/subfield≥ ≤subfield code="c"≥OPIACS≤/subfield≥ ≤/datafield≥ ≤datafield tag="100" ind1=" " ind2=" "≥ ≤subfield code="a"≥Zimmerman, Linda≤/subfield≥ ≤/datafield≥ ≤datafield tag="942" ind1=" " ind2=" "≥ ≤subfield code="a"≥PERSO_NAME≤/subfield≥ ≤/datafield≥ ≤/record≥ On Mar 3, 2011, at 16:17 , Peter Huerter wrote:
The XML in the last post is being automatically processed away, so here it is in a text file attachment:
http://koha.1045719.n5.nabble.com/file/n3408083/mymarc.txt mymarc.txt
Cheers, Pete.
-- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34080... Sent from the Koha - Discuss mailing list archive at Nabble.com. _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Hello Pete, I don't know if you solved your problem yet but I had a similar problem this weekend which I solved somehow. This is my solution. I wanted to import both authorities and biblios and then link them. Note that I was importing from a system which already had the link in the form of an id stored in 200@6. First I imported authorities. I wrote a script which parsed the marc for the authorities and stored the old id in a field (900@6) before importing. Then, after the import I wrote another script which parsed the marc for biblios, connected to the db to get the newid (something like: select authid newid, marcxml from auth_ header where extractvalue(marcxml,'//datafield[\@tag=900]/subfield[\@code=6]') = '$aid'", actually I created a temp table to store the association for speed). I stored the newauthid from koha in the 9 subfield of the auth linked field. Upon import of biblios the link was present and working. If you don't have the id you can probably still perform a per name search. Regards, Len www.len.ro On 03/03/2011 05:17 PM, Peter Huerter wrote:
The XML in the last post is being automatically processed away, so here it is in a text file attachment:
http://koha.1045719.n5.nabble.com/file/n3408083/mymarc.txt mymarc.txt
Cheers, Pete.
-- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34080... Sent from the Koha - Discuss mailing list archive at Nabble.com. _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Dear Koha developers, I would like to bump my original post (previous title "bulk BiblioAddsAuthorities"). addbiblio.pl appears to be called when one chooses "Edit Record" from the koha admin biblio details page. This code appears to call BiblioAddAuthorities and I *really like* the result. Authorities are generated for authors, subjects, and secondary authors. Even better the string "machine generated" is used in the added authorities to mark them as being created automatically. This is great! I would like to do this in batch for all of our biblios since we manage our own authorities in house. I would really like to make use of this feature since it works so well in the singleton case. However I have run into problems writing a perl script to do this. I have a script working that iterates through a list of biblios calling BiblioAddAuthorities however I have one problem related to bulk processing, a temporal issue, or perhaps a caching issue(?). BiblioAddAuthorities relies on SimpleSearch to search for existing authorities. However when I am processing in batch SimpleSearch fails for *recently added authorities*. I think SimpleSearch relies on Zebra. I've tried adding dbh->commits, and re-indexing zebra before processing each biblio, but this does not seem to solve my problem. When my script exits, SimpleSearch finds recently added authorities. It is almost like there is a lazy write, or caching middle-layer getting in the way (but probably working very well for it's intended purpose of course). It seems that recently added authorities are stuck in some sort of cache somewhere? At the SQL level? At the Zebra layer? How do I make sure that Zebra, and SQL are written to so that SimpleSearch has the latest up-to-date data to base it's search on? Any pointers would be greatly appreciated. Pete. (Volunteer with the ACS - http://aandc.org/) -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p34131... Sent from the Koha - Discuss mailing list archive at Nabble.com.
Le 07/03/2011 21:55, Peter Huerter a écrit :
Dear Koha developers,
I would like to bump my original post (previous title "bulk BiblioAddsAuthorities").
addbiblio.pl appears to be called when one chooses "Edit Record" from the koha admin biblio details page. This code appears to call BiblioAddAuthorities and I *really like* the result. Authorities are generated for authors, subjects, and secondary authors. Even better the string "machine generated" is used in the added authorities to mark them as being created automatically. This is great! I would like to do this in batch for all of our biblios since we manage our own authorities in house. I would really like to make use of this feature since it works so well in the singleton case. However I have run into problems writing a perl script to do this.
I have a script working that iterates through a list of biblios calling BiblioAddAuthorities however I have one problem related to bulk processing, a temporal issue, or perhaps a caching issue(?).
BiblioAddAuthorities relies on SimpleSearch to search for existing authorities. However when I am processing in batch SimpleSearch fails for *recently added authorities*. I think SimpleSearch relies on Zebra.
I've tried adding dbh->commits, and re-indexing zebra before processing each biblio, but this does not seem to solve my problem. When my script exits, SimpleSearch finds recently added authorities. It is almost like there is a lazy write, or caching middle-layer getting in the way (but probably working very well for it's intended purpose of course).
It seems that recently added authorities are stuck in some sort of cache somewhere? At the SQL level? At the Zebra layer?
How do I make sure that Zebra, and SQL are written to so that SimpleSearch has the latest up-to-date data to base it's search on? SimpleSearch is relying on zebra. In order to make sure that it is indexed, you should index before your bulk edition goes. Hope that helps. -- Henri-Damien LAURENT
Just following up ... I finally got a script to work that loops through all biblios calling BiblioAddsAuthorities. It is very slow however since it calls rebuild_zebra.pl every time a new authority is added (it took 10 hours to link ~12000 biblios). But for our purposes it does what we want really well. It extends some existing/fantastic Koha code to a batch job for a task (adding authorities) that is otherwise non-trivial (as far as I can tell). We manage our own authorities in-house. To make the script work there were 3 technical issues: 1) Call rebuild_zebra.pl every time a new authority is added. This is required for SimpleSearch to be able to find a recently added authority. I used perl fork/exec/waitpid for this. 2) Re-indexing zebra was not enough to make SimpleSearch "see" recently added authorities. The connection to zebra had to be reset. For this I used the set_context, and restore_context interface. It appears that when you have an active Context, and even if you rebuild_zebra, that context encapsulates a stale Zebra index. So you need to create a new Context in which to do your SimpleSearch, and then recycle that one (so you don't run out of system resources). 3) For a long running script it appears that set_context, and restore_context does not cleanup all open file handles, so I had to work around a system error "no files left" or something like that. To do that I used a shameless hack. I wrote a .csh script to call my perl script so that it processes only 100 biblios at a time. That way when perl exits each time, the system resources (stale file handles, etc.) are recycled. Of course it is possible that there is a problem with my script, but the hack worked around it and I was in a hurry. I'd be happy to share the script with anyone who is interested. I'm not sure what the policy is on refactoring Koha code and posting it. It only works for MARC21 format - not sure what if anything would need to be done for UNIMARC. Thanks for all of your help, Pete. btw. this is one of the first perl script I have ever written, so it is not slick if you know what I mean :) -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p37894... Sent from the Koha - Discuss mailing list archive at Nabble.com.
Here is the code. http://koha.1045719.n5.nabble.com/file/n3791805/acsauthlink.pl acsauthlink.pl http://koha.1045719.n5.nabble.com/file/n3791805/callacsauthlink callacsauthlink I hope this helps someone. Comments welcome. Pete. -- View this message in context: http://koha.1045719.n5.nabble.com/bulk-BiblioAddsAuthorities-tp3378000p37918... Sent from the Koha - Discuss mailing list archive at Nabble.com.
participants (5)
-
Elaine Bradtke -
LAURENT Henri-Damien -
Marilen Corciovei -
Peter Huerter -
Stefano Bargioni