Re:[Koha] data import question
Tuesday, July 22, 2003 18:20 CDT Hi, Derek, You were dead-on right in your interpreting
I believe (correct me if I'm wrong) that I want the 016.9733 to end up in biblioitems.classification and the C475m to end up in biblioitems.subclass.
Whether the gods of LC should ever have caused so much trouble in splitting up the constituent parts of a call number, the numbers map as you suggested from 852h (the classification number, i.e. the sequence of alphanumerics for all items on a given subject/topic) and 852i, the completion of the call number (the item number, i.e. the sequence of alphanumerics yielding a unique address/identifier for that particular object). Some libraries seem to ignore the formal definition of the 852h and enter the WHOLE call number in that field (classification part and item specific designator); this certainly makes things easier, although it is not strictly speaking correct. You could make a similar pitch to use the 852c that way, although again, not strictly speaking correct. Just a quick addendum re: Larry Stamm's response -- at least, I think it was Larry's part infra, but I wasn't quite clear who was writing, so my apologies to Larry if this is a misattribution -- viz. what is arbitrary in the '852'. Of course, all of the structure of tags and subfields is arbitrary (in the sense of having no real reason for having been designated as they are), but the mapping of the 852 is a little more defined now. Remember to check LC's MARC documentation to see what fields and subfields within fields are defined and how they are used. The concise MARC info online is kept up-to-date and it does contain examples to help clarify proper usage. For 852 surf to URL <http://www.loc.gov/marc/bibliographic/ecbdhold.html#mrcb852>
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m items.barcode -> 852 p items.dateaccessioned -> 852 8 items.homebranch -> 852 a (arbitrary mapping) items.price -> 852 9 items.itemnotes -> 852 z items.holdingbranch -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and >> reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for now.
# From the prescribed definition and usage examples given, you will note that 852 $8 is designated for Link and Sequence Number 852 $9 is not defined [numbers tend to be avoided though] 852 $a is prescribed: it must be a valid Organization Code 852 $b is prescribed: as Sublocation or Collection 852 $f is prescribed: Coded Location Qualifier, to distinguish subparts or specific issues of an items that are located apart from the main holdings of the same item Mapping for 852a, 852b, and 852f should not be treated arbitrarily. Your 852b will doubtless be correct. For the 852a, you may have a valid location code assigned to your library: check the MARC list for organizations or ask your librarian for the Library Symbol, WHO code, etc. or whatever is normal for your library. Your usage of 852m and 852p seems fine (I've certainly seen many examples of them used as you propose). For the rest, I'd suggest an alternate mapping: items.price --> 852 $x Non-public note [LC ex. gives accession no.] items.itemnumber --> 852 $w [unused; unlikely to be used in future] items.dateaccessioned --> 852 $d [unused; unlikely to be used in future: if possible, you could use a second 852x, since LC allows this as a repeatable subfield] If you employ $w and $d, you'd avoid potential overwrite problems in the future when importing records or making global changes.
Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and >> reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error.
You might be able to use a tool like MARCEdit (available off the LC MARC tools page) to do simple global edits. You would need to copy data from fields you know it is currently stored in, transfer it to a temporary dummy field (a 9xx) where you can set up the data as you need. Once you have all the data in standardised places, you could use a global change delete to remove the 852's that are there and then reconstruct them as you need them from the correct locations in the dummy fields you set up. Just a thought anyway. Kudos to Larry for an incredibly well-thought-out and complete explanation. Good luck with your presentation, Derek. Cheers, Steven F. Baljkas library tech at large Koha neophyte Winnipeg, MB CANADA original discussion follows below ---------------------------------- Derek Dresser <Derek.Dresser@gouldacademy.org> wrote:
Quoting Derek Dresser <Derek.Dresser@gouldacademy.org>:
Quoting Larry Stamm <larry@larrystamm.com>:
Derek Dresser <Derek.Dresser@gouldacademy.org> writes:
Is there an already defined option to extract the "item" data from the MARC 852 field (I believe from some responses that I >> > have received that this is a fairly standard location for the "item" data). Any help or additional information would be appreciated. Perhaps there is a way for me to rewrite my MARC data so that that information will be read properly?
Hi Derek,
No, you have to define the mappings from the marc fields to the items table yourself. This is done in from the Koha intranet interface, going to the Parameters link, and then to the "Links Koha-MARC DB" link, and then to the items tab from the drop-down menu. Then you start mapping the koha items table column names to >> MARC subfields.
Note that _all_ of these mappings have to be in the subfields of the same MARC field, so that if you are using the 852 field (as our library is) then all the needed items table columns will need to be mapped to 852 subfields.
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m items.barcode -> 852 p items.dateaccessioned -> 852 8 items.homebranch -> 852 a (arbitrary mapping) items.price -> 852 9 items.itemnotes -> 852 z items.holdingbranch -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on where your current MARC records are storing data. Since many of our library's records were entered by hand, I am having to "fix and reconstitute" the MARC database to standardize the entries. This is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for now.
In addition, you have to go to the "MARC tag structure" link under Parameters and change the tab value of all the subfields used in the items table to "items(10)", except for the tab value of items.itemnumber which should be set to "-1(ignore)" since this value will be autoincremented by the bulkmarcimport script and should not read any values that might inadvertently be in your MARC records that you are trying to import.
Then go back to the main Parameters page, and click the "MARC Check" link. This will check all your MARC-Koha mappings to see if they are valid. If there are errors, it will point them out and you need to fix them before trying to use the bulkmarcimport script.
Once you get an error free message from "MARC-Check", then you should be good to use bulkmarcimport. Remember to use the "-d" option to delete the previous entries.
Once you get the biblio data into the items tables, the circulation data from your old system can be extracted and added to the table if you want. You will probably have to write your own script to automate this, since each library system seems to store its circulation data in a lightly different fashion.
I find the bulkmarcimport script to be interminably slow, entering only about 6000 items per hour on our 950 MHz CPU server running Mandrake 9.0. It is not much slower on my 266 MHZ home machine that I am using as a test machine. Because I have to enter all 15,000 items from our collection to really test for errors in my MARC "munging", it means another 2 1/2 hr wait to load up all the items again after fixing a mistake before testing. I might get frustrated enough to try my hand at writing a faster upload script...
Larry, this is ENORMOUSLY helpful. Thank you. I am getting closer. I still have one error when I run MARC Check.
ALL items fields MUST :
be mapped to the same tag, and they must all be in the 10 (items) tab
here is my mapping.
items.itemnumber -> 852f (arbitrary) items.barcode -> 852p items.homebranch -> 852a (arbitrary) items.itemnotes -> 852z (arbitrary) items.holdingbranch -> 852b (arbitrary)
I did set all the items fields to "item(10)" except for items.itemnumber which is set to -1 "ignore"
The only actual fields that seem to be used in my data are 852h - call number 852p - bar code
anything obvious that I'm missing? Also, where is the call number supposed to end up? by call number, I think I mean dewey number and >> local call number. It looks like this in my data _h016.9733 C475m
Thanks again. With your help, I'm getting closer.
Hi again,
Is it true that the 016.9733 part of the 852h field (example above) supposed to end up in biblioitems.classification and the C475m part supposed to end up in biblioitems.subclass?
When mapping the MARC subfield structure, is the "tab" parameter used to extract multiple values from the same subfield? For example from above would the 016.9733 be subfield 852h tab0? and C475m be subfield 852h tab1? or am I misinterpreting the "tab" parameter?
I imported a small subset of my data using the mapping above (even though MARC Check still reports the one error shown above) I now get data in my items table in the database. The problem I am having now is that the biblioitems.classification field is NULL for all my records even though it is mapped to 852h.
baljkas <baljkas@mb.sympatico.ca> writes: > Just a quick addendum re: Larry Stamm's response -- at least, I > think it was Larry's part infra, but I wasn't quite clear who was > writing, so my apologies to Larry if this is a misattribution -- > viz. what is arbitrary in the '852'. > Of course, all of the structure of tags and subfields is arbitrary > (in the sense of having no real reason for having been designated > as they are), but the mapping of the 852 is a little more defined > now. Remember to check LC's MARC documentation to see what fields > and subfields within fields are defined and how they are used. The > concise MARC info online is kept up-to-date and it does contain > examples to help clarify proper usage. For 852 surf to > URL > <http://www.loc.gov/marc/bibliographic/ecbdhold.html#mrcb852> Thanks for the clarification and link! MARC structure is a bit hard to get a grasp of for those of us not trained as librarians. > From the prescribed definition and usage examples given, you will > note that > 852 $8 is designated for Link and Sequence Number 852 $9 is not > defined [numbers tend to be avoided though] 852 $a is prescribed: > it must be a valid Organization Code 852 $b is prescribed: as > Sublocation or Collection 852 $f is prescribed: Coded Location > Qualifier, to distinguish subparts or specific issues of an items > that are located apart from the main holdings of the same item > Mapping for 852a, 852b, and 852f should not be treated > arbitrarily. My understanding is that the 852 field is all for "local" data, and is mainly for use within an organization. Is that true? It seems that our library staff has manually added all the 852 data since we automated, and this is where the majority of irregularities occurs in our MARC records. The reason I chose the arbitrary mappings I did was because our current MARC records have no data in those subfields, so there would be no conflict in importation into Koha. Your suggestions for location fields make more sense and are helpful. > For the rest, I'd suggest an alternate mapping: > items.price --> 852 $x Non-public note [LC ex. gives accession > no.] items.itemnumber --> 852 $w [unused; unlikely to be used in > future] items.dateaccessioned --> 852 $d [unused; unlikely to be > used in future: if possible, you could use a second 852x, since LC > allows this as a repeatable subfield] The manual for our current software (Sagebrush's Athena) suggests this mapping: 852 6 --> Format (eg, book, paperback) 852 7 --> Aquisition fund 852 8 --> Aquisition date 852 9 --> Aquisition price And consequently this is where this data lies in our MARC records. Given the discrepancy between this usage and that laid out in the Lbrary of Congress website, I'm wondering just how standardized actual MARC implementaions are across all libraries? > You might be able to use a tool like MARCEdit (available off the > LC MARC tools page) to do simple global edits. You would need to > copy data from fields you know it is currently stored in, transfer > it to a temporary dummy field (a 9xx) where you can set up the > data as you need. Once you have all the data in standardised > places, you could use a global change delete to remove the 852's > that are there and then reconstruct them as you need them from the > correct locations in the dummy fields you set up. Just a thought > anyway. I went ahead and made some small changes, like changing "fic", "Fic", "FiC", all to "FIC" and limiting further entries into the Athena database to "FIC" for fictional books. That amounted to a criminal offence (almost), according to the staff. :) From a management point of view, I think it will be easier to just get the data correctly imported into Koha's database tables and do the correction and standardization in that database. Then limiting manual biblio entries to only authorized standard values and/or formats can be hidden within the general chaos of implementing new software. -- Larry Stamm, Chair McBride and District Public Library McBride, BC V0J 2E0 Canada http://www.mcbridebc.org/library
participants (2)
-
baljkas@mb.sympatico.ca -
Larry Stamm