Hello, I'm still having some trouble getting our data migrated from an old Winnebago system to koha 2.0.0pre2. I am using a MICROLIF (USMARC) file and attempting to use the bulkmarcimport.pl script like so. /usr/local/koha/scripts/misc/bulkmarcimport.pl -file MICROLIF.001 it appears to import fine. Here is a piece of the script output as it is importing the data ==>LDR 01088cam 2200265 i 4500 001 79001533 //r86 008 790514s1979 nyu b 00110 eng 020 _a0835211533 : _c$14.95 043 _ae-uk--- _an-us--- 050 0 _aPR830.F3 _bT9 082 _a016.823/0876 100 10 _aTymn, Marshall B., _d1937- 245 10 _aFantasy literature : _ba core collection and reference guide / _cMarshall B. Tymn, Kenneth J. Zahorski, and Robert H. Boyer. 260 0 _aNew York : _bR. R. Bowker Co., _c1979. 300 _axiii, 273 p. ; _c24 cm. 504 _a"Directory of publishers": p. 232-239. 500 _aIncludes index. 650 0 _aFantastic fiction, English _xHistory and criticism _xAddresses, essays, lectures. 650 0 _aFantastic fiction, American _xHistory and criticism _xAddresses, essays, lectures. 650 0 _aFantastic fiction, English _xBibliography. 650 0 _aFantastic fiction, American _xBibliography. 700 10 _aZahorski, Kenneth J., _d1939- _ejoint author. 700 10 _aBoyer, Robert H., _d1937- _ejoint author. 852 1 _9p14.95 _h016.823 T976f _p10022 961 _t6 at /usr/local/koha/intranet/scripts/misc/bulkmarcimport.pl line 79. biblio 22 : items found ADDED biblio NB 45 in DB ==>LDR 00739nam 2200193 i 4500 001 77622252 008 770803s1976 meu b s00010 eng 020 _a0913764086 043 _an-us--- _an-us-me 050 0 _aZ1238 _b.C48 _aE263.M4 082 _a016.9733/09741 100 10 _aChurchill, Edwin A. 245 10 _aMaine communities and the War for Independence : _ba guide for the study of local Maine history as related to the American Revolution / _cby Edwin A. Churchill. 260 0 _a[Augusta] : _bMaine State Museum, _c1976. 300 _avi, 110 p. ; _c22 cm. 651 0 _aMaine _xHistory _yRevolution, 1775-1783 _xBibliography. 651 0 _aUnited States _xHistory _yRevolution, 1775-1783 _xBibliography. 852 1 _h016.9733 C475m _p10023 961 _t6 at /usr/local/koha/intranet/scripts/misc/bulkmarcimport.pl line 79. biblio 23 : items found ADDED biblio NB 46 in DB The issue I am still having is that it is creating biblios and biblioitems in the database, but no "items" This seems to be a problem of correctly mapping the fields in the MARC record to koha's database. specifically, the item information seems to be in MARC field 852. I haven't yet been able to map things correctly though. I have a few questions. When installing, I am selecting MARC21 parameters instead of UNIMARC. Is this correct? What's the difference? Is there an already defined option to extract the "item" data from the MARC 852 field (I believe from some responses that I have received that this is a fairly standard location for the "item" data). Any help or additional information would be appreciated. Perhaps there is a way for me to rewrite my MARC data so that that information will be read properly? I'm giving a demo next week to a bunch of school technologists on the basics of installing and configuring Koha as part of an open source seminar we are putting on. I should be able to pull that off using the sample data if necessary, but it would be really nice if I could show them our actual library data. :-) Thanks in advance for any help or suggestions, Derek -- Derek Dresser http://network.gouldacademy.org/ Gould Academy Bethel, ME 04217 (207)824-7700 "Nothing endures but change" --Heraclitus ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/
Derek Dresser <Derek.Dresser@gouldacademy.org> writes: > Is there an already defined option to extract the "item" data from > the MARC 852 field (I believe from some responses that I have > received that this is a fairly standard location for the "item" > data). Any help or additional information would be appreciated. > Perhaps there is a way for me to rewrite my MARC data so that that > information will be read properly? Hi Derek, No, you have to define the mappings from the marc fields to the items table yourself. This is done in from the Koha intranet interface, going to the Parameters link, and then to the "Links Koha-MARC DB" link, and then to the items tab from the drop-down menu. Then you start mapping the koha items table column names to MARC subfields. Note that _all_ of these mappings have to be in the subfields of the same MARC field, so that if you are using the 852 field (as our library is) then all the needed items table columns will need to be mapped to 852 subfields. The mapping that I am trying at the moment is: items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield in the original MARC record of our old system ) items.multivolumepart -> 852 m items.barcode -> 852 p items.dateaccessioned -> 852 8 items.homebranch -> 852 a (arbitrary mapping) items.price -> 852 9 items.itemnotes -> 852 z items.holdingbranch -> 852 b (arbitrary mapping) Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error. The rest of the columns in the items tables are left unmapped for now. In addition, you have to go to the "MARC tag structure" link under Parameters and change the tab value of all the subfields used in the items table to "items(10)", except for the tab value of items.itemnumber which should be set to "-1(ignore)" since this value will be autoincremented by the bulkmarcimport script and should not read any values that might inadvertently be in your MARC records that you are trying to import. Then go back to the main Parameters page, and click the "MARC Check" link. This will check all your MARC-Koha mappings to see if they are valid. If there are errors, it will point them out and you need to fix them before trying to use the bulkmarcimport script. Once you get an error free message from "MARC-Check", then you should be good to use bulkmarcimport. Remember to use the "-d" option to delete the previous entries. Once you get the biblio data into the items tables, the circulation data from your old system can be extracted and added to the table if you want. You will probably have to write your own script to automate this, since each library system seems to store its circulation data in a lightly different fashion. I find the bulkmarcimport script to be interminably slow, entering only about 6000 items per hour on our 950 MHz CPU server running Mandrake 9.0. It is not much slower on my 266 MHZ home machine that I am using as a test machine. Because I have to enter all 15,000 items from our collection to really test for errors in my MARC "munging", it means another 2 1/2 hr wait to load up all the items again after fixing a mistake before testing. I might get frustrated enough to try my hand at writing a faster upload script... -- Larry Stamm, Chair McBride and District Public Library McBride, BC V0J 2E0 Canada http://www.mcbridebc.org/library .
Quoting Larry Stamm <larry@larrystamm.com>:
Derek Dresser <Derek.Dresser@gouldacademy.org> writes:
> Is there an already defined option to extract the "item" data from > the MARC 852 field (I believe from some responses that I have > received that this is a fairly standard location for the "item" > data). Any help or additional information would be appreciated. > Perhaps there is a way for me to rewrite my MARC data so that that > information will be read properly?
Hi Derek,
No, you have to define the mappings from the marc fields to the items table yourself. This is done in from the Koha intranet interface, going to the Parameters link, and then to the "Links Koha-MARC DB" link, and then to the items tab from the drop-down menu. Then you start mapping the koha items table column names to MARC subfields.
Note that _all_ of these mappings have to be in the subfields of the same MARC field, so that if you are using the 852 field (as our library is) then all the needed items table columns will need to be mapped to 852 subfields.
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m items.barcode -> 852 p items.dateaccessioned -> 852 8 items.homebranch -> 852 a (arbitrary mapping) items.price -> 852 9 items.itemnotes -> 852 z items.holdingbranch -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for now. In addition, you have to go to the "MARC tag structure" link under Parameters and change the tab value of all the subfields used in the items table to "items(10)", except for the tab value of items.itemnumber which should be set to "-1(ignore)" since this value will be autoincremented by the bulkmarcimport script and should not read any values that might inadvertently be in your MARC records that you are trying to import.
Then go back to the main Parameters page, and click the "MARC Check" link. This will check all your MARC-Koha mappings to see if they are valid. If there are errors, it will point them out and you need to fix them before trying to use the bulkmarcimport script. Once you get an error free message from "MARC-Check", then you should be good to use bulkmarcimport. Remember to use the "-d" option to delete the previous entries.
Once you get the biblio data into the items tables, the circulation data from your old system can be extracted and added to the table if you want. You will probably have to write your own script to automate this, since each library system seems to store its circulation data in a lightly different fashion.
I find the bulkmarcimport script to be interminably slow, entering only about 6000 items per hour on our 950 MHz CPU server running Mandrake 9.0. It is not much slower on my 266 MHZ home machine that I am using as a test machine. Because I have to enter all 15,000 items from our collection to really test for errors in my MARC "munging", it means another 2 1/2 hr wait to load up all the items again after fixing a mistake before testing. I might get frustrated enough to try my hand at writing a faster upload script...
Larry, this is ENORMOUSLY helpful. Thank you. I am getting closer. I still have one error when I run MARC Check. ALL items fields MUST : be mapped to the same tag, and they must all be in the 10 (items) tab here is my mapping. items.itemnumber -> 852f (arbitrary) items.barcode -> 852p items.homebranch -> 852a (arbitrary) items.itemnotes -> 852z (arbitrary) items.holdingbranch -> 852b (arbitrary) I did set all the items fields to "item(10)" except for items.itemnumber which is set to -1 "ignore" The only actual fields that seem to be used in my data are 852h - call number 852p - bar code anything obvious that I'm missing? Also, where is the call number supposed to end up? by call number, I think I mean dewey number and local call number. It looks like this in my data _h016.9733 C475m Thanks again. With your help, I'm getting closer. -Derek
-- Larry Stamm, Chair McBride and District Public Library McBride, BC V0J 2E0 Canada http://www.mcbridebc.org/library
.
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Derek Dresser http://network.gouldacademy.org/ Gould Academy Bethel, ME 04217 (207)824-7700 "Nothing endures but change" --Heraclitus ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/
Quoting Derek Dresser <Derek.Dresser@gouldacademy.org>:
Quoting Larry Stamm <larry@larrystamm.com>:
Derek Dresser <Derek.Dresser@gouldacademy.org> writes:
> Is there an already defined option to extract the "item" data from > the MARC 852 field (I believe from some responses that I have > received that this is a fairly standard location for the "item" > data). Any help or additional information would be appreciated. > Perhaps there is a way for me to rewrite my MARC data so that that > information will be read properly?
Hi Derek,
No, you have to define the mappings from the marc fields to the items table yourself. This is done in from the Koha intranet interface, going to the Parameters link, and then to the "Links Koha-MARC DB" link, and then to the items tab from the drop-down menu. Then you start mapping the koha items table column names to MARC subfields.
Note that _all_ of these mappings have to be in the subfields of the same MARC field, so that if you are using the 852 field (as our library is) then all the needed items table columns will need to be mapped to 852 subfields.
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m items.barcode -> 852 p items.dateaccessioned -> 852 8 items.homebranch -> 852 a (arbitrary mapping) items.price -> 852 9 items.itemnotes -> 852 z items.holdingbranch -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for now. In addition, you have to go to the "MARC tag structure" link under Parameters and change the tab value of all the subfields used in the items table to "items(10)", except for the tab value of items.itemnumber which should be set to "-1(ignore)" since this value will be autoincremented by the bulkmarcimport script and should not read any values that might inadvertently be in your MARC records that you are trying to import.
Then go back to the main Parameters page, and click the "MARC Check" link. This will check all your MARC-Koha mappings to see if they are valid. If there are errors, it will point them out and you need to fix them before trying to use the bulkmarcimport script. Once you get an error free message from "MARC-Check", then you should be good to use bulkmarcimport. Remember to use the "-d" option to delete the previous entries.
Once you get the biblio data into the items tables, the circulation data from your old system can be extracted and added to the table if you want. You will probably have to write your own script to automate this, since each library system seems to store its circulation data in a lightly different fashion.
I find the bulkmarcimport script to be interminably slow, entering only about 6000 items per hour on our 950 MHz CPU server running Mandrake 9.0. It is not much slower on my 266 MHZ home machine that I am using as a test machine. Because I have to enter all 15,000 items from our collection to really test for errors in my MARC "munging", it means another 2 1/2 hr wait to load up all the items again after fixing a mistake before testing. I might get frustrated enough to try my hand at writing a faster upload script...
Larry, this is ENORMOUSLY helpful. Thank you. I am getting closer. I still have one error when I run MARC Check.
ALL items fields MUST :
be mapped to the same tag, and they must all be in the 10 (items) tab
here is my mapping.
items.itemnumber -> 852f (arbitrary) items.barcode -> 852p items.homebranch -> 852a (arbitrary) items.itemnotes -> 852z (arbitrary) items.holdingbranch -> 852b (arbitrary)
I did set all the items fields to "item(10)" except for items.itemnumber which is set to -1 "ignore"
The only actual fields that seem to be used in my data are 852h - call number 852p - bar code
anything obvious that I'm missing? Also, where is the call number supposed to end up? by call number, I think I mean dewey number and local call number. It looks like this in my data _h016.9733 C475m
Thanks again. With your help, I'm getting closer.
Hi again, Is it true that the 016.9733 part of the 852h field (example above) supposed to end up in biblioitems.classification and the C475m part supposed to end up in biblioitems.subclass? When mapping the MARC subfield structure, is the "tab" parameter used to extract multiple values from the same subfield? For example from above would the 016.9733 be subfield 852h tab0? and C475m be subfield 852h tab1? or am I misinterpreting the "tab" parameter? I imported a small subset of my data using the mapping above (even though MARC Check still reports the one error shown above) I now get data in my items table in the database. The problem I am having now is that the biblioitems.classification field is NULL for all my records even though it is mapped to 852h. I believe (correct me if I'm wrong) that I want the 016.9733 to end up in biblioitems.classification and the C475m to end up in biblioitems.subclass. Thanks in advance for any help. Getting closer all the time :-) -Derek
-Derek
-- Larry Stamm, Chair McBride and District Public Library McBride, BC V0J 2E0 Canada http://www.mcbridebc.org/library
.
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Derek Dresser http://network.gouldacademy.org/ Gould Academy Bethel, ME 04217 (207)824-7700
"Nothing endures but change" --Heraclitus
------------------------------------------------- This mail sent through IMP: http://horde.org/imp/
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Derek Dresser http://network.gouldacademy.org/ Gould Academy Bethel, ME 04217 (207)824-7700 "Nothing endures but change" --Heraclitus ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/
Derek Dresser <Derek.Dresser@gouldacademy.org> writes: > Is it true that the 016.9733 part of the 852h field (example > above) supposed to end up in biblioitems.classification and the > C475m part supposed to end up in biblioitems.subclass? As Steven Baljkas already replied, yes. I am mapping 082a to biblioitems.dewy field and that seems to arrive in the database intact too. > When mapping the MARC subfield structure, is the "tab" parameter > used to extract multiple values from the same subfield? For > example from above would the 016.9733 be subfield 852h tab0? and > C475m be subfield 852h tab1? or am I misinterpreting the "tab" > parameter? I think the tab parameter is more of a precedence value setter. My experiments seem to show that any MARC subfield set to -1(ignore) will be ignored by bulkmarcimport.pl when parsing the import MARC file. If you want to map multiple MARC subfields to the same mysql table column, then you can do so by selecting the same koha field from the drop down menu and putting all tab values to the same (positive) value. See the default settings for the 100 author fields for an example. I presume you can do multiple mappings with one MARC field taking precendence over the others by giving it a higher tab value, but I haven't tested this. So I expect you will have to split your 852h data using other methods than provided by Koha. It is easy enough to extract just this field using the MARC RTP program and then split it using awk or cut or any number of other tools, but I have no idea how to write it back into a standard MARC file. Probably perl is the tool of choice for this. > I imported a small subset of my data using the mapping above (even > though MARC Check still reports the one error shown above) I now > get data in my items table in the database. The problem I am > having now is that the biblioitems.classification field is NULL > for all my records even though it is mapped to 852h. Check that the tab value for your biblioitems.classification field is not -1(ignore). Since this is the 852h subfield, this might be generating the error message if it is not set to 10? In addition, the tables bibliosubject, bibliosubtitle, additionalauthors, itemtypes, possibly biblioanalysis, and probably a few more tables need to be populated with MARC data for the koha search and display functions to perform properly on an imported collection. Bulkmarcimport.pl won't do this, AFAIK; you have to extract the data and insert it into the mysql tables yourself, using the tools of your choice. Hope this helps. I think it would be of value if somebody who has got all this figured out to map each form field of the Koha html interface to the Koha mysql table field from where the data is drawn. Then it would be easier to figure out how to map the MARC fields to Koha. -- Larry Stamm, Chair McBride and District Public Library McBride, BC V0J 2E0 Canada http://www.mcbridebc.org/library
Larry Stamm wrote:
Derek Dresser <Derek.Dresser@gouldacademy.org> writes:
Is there an already defined option to extract the "item" data from the MARC 852 field (I believe from some responses that I have received that this is a fairly standard location for the "item" data). Any help or additional information would be appreciated. Perhaps there is a way for me to rewrite my MARC data so that that information will be read properly?
Hi Derek,
No, you have to define the mappings from the marc fields to the items table yourself. This is done in from the Koha intranet interface, going to the Parameters link, and then to the "Links Koha-MARC DB" link, and then to the items tab from the drop-down menu. Then you start mapping the koha items table column names to MARC subfields.
Note that _all_ of these mappings have to be in the subfields of the same MARC field, so that if you are using the 852 field (as our library is) then all the needed items table columns will need to be mapped to 852 subfields.
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m items.barcode -> 852 p items.dateaccessioned -> 852 8 items.homebranch -> 852 a (arbitrary mapping) items.price -> 852 9 items.itemnotes -> 852 z items.holdingbranch -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for now. In addition, you have to go to the "MARC tag structure" link under Parameters and change the tab value of all the subfields used in the items table to "items(10)", except for the tab value of items.itemnumber which should be set to "-1(ignore)" since this value will be autoincremented by the bulkmarcimport script and should not read any values that might inadvertently be in your MARC records that you are trying to import.
Then go back to the main Parameters page, and click the "MARC Check" link. This will check all your MARC-Koha mappings to see if they are valid. If there are errors, it will point them out and you need to fix them before trying to use the bulkmarcimport script. Once you get an error free message from "MARC-Check", then you should be good to use bulkmarcimport. Remember to use the "-d" option to delete the previous entries.
Once you get the biblio data into the items tables, the circulation data from your old system can be extracted and added to the table if you want. You will probably have to write your own script to automate this, since each library system seems to store its circulation data in a lightly different fashion.
I find the bulkmarcimport script to be interminably slow, entering only about 6000 items per hour on our 950 MHz CPU server running Mandrake 9.0. It is not much slower on my 266 MHZ home machine that I am using as a test machine. Because I have to enter all 15,000 items from our collection to really test for errors in my MARC "munging", it means another 2 1/2 hr wait to load up all the items again after fixing a mistake before testing. I might get frustrated enough to try my hand at writing a faster upload script...
6000 items per hour, means less than 2 items/second. I think the problem comes from your IDE drive and/or your mysql configuration. I agree that the bulkmarcimport script is slow because it does a LOT of things (in Biblio.pm). But the main reason of the lack of speed is mySQL inserting (a lot in fact). If anyone has a magic sql statement to improve mySQL insert (maybe chapter 10 of mySQL doc could be helpfull) feel free to suggest... I suggest two ideas : * drop indexes (except pk) at beginning & rebuild them at end * lock table at beginning & unlock at end (LOCK TABLES XXX WRITE;) affected tables could be : * marc_biblio, marc_subfield_table, marc_word * biblio, biblioitems, items Could anyone give a try to those ideas ? -- Paul POULAIN Consultant indépendant en logiciels libres responsable francophone de koha (SIGB libre http://www.koha-fr.org)
On Tue, 29 Jul 2003, paul POULAIN wrote:
If anyone has a magic sql statement to improve mySQL insert (maybe chapter 10 of mySQL doc could be helpfull) feel free to suggest... I suggest two ideas : * drop indexes (except pk) at beginning & rebuild them at end * lock table at beginning & unlock at end (LOCK TABLES XXX WRITE;)
I'd be interested to see how these actually work, but we do a lot of stuff with MySQL without doing either of those things and we don't have these sorts of problems. Every day we use Perl to transform several gigabytes of SQL into gigabytes of other SQL and MySQL has rarely been the bottle neck. There are a few things we do habitually that might make a difference to the original poster and for future development: + bulk inserts should be done as insert delayed to reduce the amount of I/O contention. I believe (but I'm too lazy to confirm) that "insert delayed" works less hard on maintaining the indexes as well. But whether it does now or if that's just a potential feature, bulk things should be flagged as such so MySQL can try to do the right thing. This makes a huge difference - a similar effect to what dropping indexes did for Oracle. (Of course I haven't touched Oracle since Oracle7 came out... tragedy. :) + configure MySQL properly. I get a bit miffed about this one because people on certain other mailling lists often flame MySQL for underperforming on the cheap fast hardware available easily these days. It's pretty simple to take one of the sample my.cnf files that comes with the RPM distribution and pick one that sounds similar to your machine. It can make a huge difference to let the caches grow. Since even good memory is pretty cheaps these days the last three database servers I've built have all had 4G of RAM so MySQL could be opened up full bore. But to return to Koha, a pointer to the appropriate MySQL documentation and the mod_perl guide would probably be a good start on a performance section. -- </chris> The death of democracy is not likely to be an assassination from ambush. It will be a slow extinction from apathy, indifference, and undernourishment. -Robert Maynard Hutchins, educator (1899-1977)
Larry Stamm wrote:
No, you have to define the mappings from the marc fields to the items
<snip>
Remember to use the "-d" option to delete the previous entries.
Kudos to larry. I'm happy to see that i'm not the only man in this world understanding how it works :-) (note that NPL understands too I think. So we are 3...) One question to everybody : would it be convenient to have, during install stage various marc21 parameters, for various item places (852 for most of you, but it seems that NPL has other flavours ? the idea being to have a working Koha quick for american guys) -- Paul POULAIN Consultant indépendant en logiciels libres responsable francophone de koha (SIGB libre http://www.koha-fr.org)
participants (4)
-
Christopher Hicks -
Derek Dresser -
Larry Stamm -
paul POULAIN