New subject: data import question

23 Jul 2003

      Tuesday, July 22, 2003	  18:20 CDT

Hi, Derek,

You were dead-on right in your interpreting
...
I believe (correct me if I'm wrong) that I want the 016.9733 to end up
in biblioitems.classification and the C475m to end up in
biblioitems.subclass.
Whether the gods of LC should ever have caused so much trouble in splitting
up the constituent parts of a call number, the numbers map as you suggested
from 852h (the classification number, i.e. the sequence of alphanumerics
for all items on a given subject/topic) and 852i, the completion of the
call number (the item number, i.e. the sequence of alphanumerics yielding a
unique address/identifier for that particular object).

Some libraries seem to ignore the formal definition of the 852h and enter
the WHOLE call number in that field (classification part and item specific
designator); this certainly makes things easier, although it is not
strictly speaking correct. You could make a similar pitch to use the 852c
that way, although again, not strictly speaking correct. 

Just a quick addendum re: Larry Stamm's response -- at least, I think it
was Larry's part infra, but I wasn't quite clear who was writing, so my
apologies to Larry if this is a misattribution -- viz. what is arbitrary in
the '852'.

Of course, all of the structure of tags and subfields is arbitrary (in the
sense of having no real reason for having been designated as they are), but
the mapping of the 852 is a little more defined now. Remember to check LC's
MARC documentation to see what fields and subfields within fields are
defined and how they are used. The concise MARC info online is kept
up-to-date and it does contain examples to help clarify proper usage. For
852 surf to

   URL <http://www.loc.gov/marc/bibliographic/ecbdhold.html#mrcb852>
...
...
...
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this
subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m
items.barcode	   -> 852 p
items.dateaccessioned -> 852 8
items.homebranch	   -> 852 a (arbitrary mapping)
items.price	   -> 852 9
items.itemnotes	   -> 852 z
items.holdingbranch   -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on
where your current MARC records are storing data.	Since many of
our library's records were entered by hand, I am having to "fix and >>
reconstitute" the MARC database to standardize the entries.  This
is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for
now.
# From the prescribed definition and usage examples given, you will note that

   852 $8 is designated for Link and Sequence Number
   852 $9 is not defined [numbers tend to be avoided though]
   852 $a is prescribed: it must be a valid Organization Code
   852 $b is prescribed: as Sublocation or Collection
   852 $f is prescribed: Coded Location Qualifier, to distinguish
	  subparts or specific issues of an items that are located
	  apart from the main holdings of the same item

Mapping for 852a, 852b, and 852f should not be treated arbitrarily.

Your 852b will doubtless be correct. For the 852a, you may have a valid
location code assigned to your library: check the MARC list for
organizations or ask your librarian for the Library Symbol, WHO code, etc.
or whatever is normal for your library.

Your usage of 852m and 852p seems fine (I've certainly seen many examples
of them used as you propose).

For the rest, I'd suggest an alternate mapping:

   items.price	-->  852 $x Non-public note [LC ex. gives accession no.] 
   items.itemnumber --> 852 $w [unused; unlikely to be used in future]
   items.dateaccessioned --> 852 $d [unused; unlikely to be used in
				     future: if possible, you could use
				     a second 852x, since LC allows
				     this as a repeatable subfield]

If you employ $w and $d, you'd avoid potential overwrite problems in the
future when importing records or making global changes.
...
...
...
Your mappings might have to be slightly different depending on
where your current MARC records are storing data.	Since many of
our library's records were entered by hand, I am having to "fix and >>
reconstitute" the MARC database to standardize the entries.  This
is not proving to be an easy job to catch every error.
You might be able to use a tool like MARCEdit (available off the LC MARC
tools page) to do simple global edits. You would need to copy data from
fields you know it is currently stored in, transfer it to a temporary 
dummy field (a 9xx) where you can set up the data as you need. Once you
have all the data in standardised places, you could use a global change
delete to remove the 852's that are there and then reconstruct them as you
need them from the correct locations in the dummy fields you set up.
Just a thought anyway.

Kudos to Larry for an incredibly well-thought-out and complete explanation.

Good luck with your presentation, Derek.

Cheers,
Steven F. Baljkas
library tech at large
Koha neophyte
Winnipeg, MB  CANADA

original discussion follows below ----------------------------------

Derek Dresser <Derek.Dresser@gouldacademy.org> wrote:
...
Quoting Derek Dresser <Derek.Dresser@gouldacademy.org>:
...
Quoting Larry Stamm <larry@larrystamm.com>:
...
Derek Dresser <Derek.Dresser@gouldacademy.org> writes:
...
Is there an already defined option to extract the "item" data 
from the MARC 852 field (I believe from some responses that I >>
    > have received that this is a fairly standard location for the
"item" data).  Any help or additional information would be
appreciated. Perhaps there is a way for me to rewrite my MARC
data so that that information will be read properly?
Hi Derek,
No, you have to define the mappings from the marc fields to the
items table yourself.  This is done in from the Koha intranet
interface, going to the Parameters link, and then to the "Links
Koha-MARC DB" link, and then to the items tab from the drop-down
menu.  Then you start mapping the koha items table column names to >>
MARC subfields.
Note that _all_ of these mappings have to be in the subfields of
the same MARC field, so that if you are using the 852 field (as our
library is) then all the needed items table columns will need to be
mapped to 852 subfields.
The mapping that I am trying at the moment is:
items.itemnumber -> 852 f (arbitrary mapping - no data in this
subfield in the original MARC record of our old system )
items.multivolumepart -> 852 m
items.barcode	   -> 852 p
items.dateaccessioned -> 852 8
items.homebranch	   -> 852 a (arbitrary mapping)
items.price	   -> 852 9
items.itemnotes	   -> 852 z
items.holdingbranch   -> 852 b (arbitrary mapping)
Your mappings might have to be slightly different depending on
where your current MARC records are storing data.	Since many of
our library's records were entered by hand, I am having to "fix and
reconstitute" the MARC database to standardize the entries.  This
is not proving to be an easy job to catch every error.
The rest of the columns in the items tables are left unmapped for
now.
In addition, you have to go to the "MARC tag structure" link under
Parameters and change the tab value of all the subfields used in
the items table to "items(10)", except for the tab value of
items.itemnumber which should be set to "-1(ignore)" since this
value will be autoincremented by the bulkmarcimport script and
should not read any values that might inadvertently be in your MARC
records that you are trying to import.
Then go back to the main Parameters page, and click the "MARC
Check" link.  This will check all your MARC-Koha mappings to see if 
they are valid. If there are errors, it will point them out and you
need to fix them before trying to use the bulkmarcimport script.
Once you get an error free message from "MARC-Check", then you
should be good to use bulkmarcimport.  Remember to use the "-d"
option to delete the previous entries.
Once you get the biblio data into the items tables, the circulation
data from your old system can be extracted and added to the table
if you want.  You will probably have to write your own script to
automate this, since each library system seems to store its
circulation data in a lightly different fashion.
I find the bulkmarcimport script to be interminably slow, entering
only about 6000 items per hour on our 950 MHz CPU server running
Mandrake 9.0.  It is not much slower on my 266 MHZ home machine
that I am using as a test machine.  Because I have to enter all
15,000 items from our collection to really test for errors in my
MARC "munging", it means another 2 1/2 hr wait to load up all the
items again after fixing a mistake before testing.  I might get
frustrated enough to try my hand at writing a faster upload
script...
Larry,
this is ENORMOUSLY helpful.	Thank you.  I am getting closer.  I
still have one error when I run MARC Check.
ALL items fields MUST :
be mapped to the same tag,
and they must all be in the 10 (items) tab
here is my mapping.
items.itemnumber  -> 852f (arbitrary)
items.barcode     -> 852p
items.homebranch  -> 852a (arbitrary)
items.itemnotes   -> 852z (arbitrary)
items.holdingbranch -> 852b (arbitrary)
I did set all the items fields to "item(10)" except for
items.itemnumber which is set to -1 "ignore"
The only actual fields that seem to be used in my data are
852h - call number
852p - bar code
anything obvious that I'm missing?  Also, where is the call number
supposed to end up?	by call number, I think I mean dewey number and >>
local call number.  It looks like this in my data _h016.9733 C475m
Thanks again.  With your help, I'm getting closer.
Hi again,
Is it true that the 016.9733 part of the 852h field (example above)
supposed to end up in biblioitems.classification and the C475m part
supposed to end up in biblioitems.subclass?
When mapping the MARC subfield structure, is the "tab" parameter used
to extract multiple values from the same subfield?  For example from
above would the 016.9733 be subfield 852h tab0? and C475m be subfield
852h tab1? or am I misinterpreting the "tab" parameter?
I imported a small subset of my data using the mapping above (even
though MARC Check still reports the one error shown above)  I now get
data in my items table in the database.  The problem I am having now is
that the biblioitems.classification field is NULL for all my records
even though it is mapped to 852h.

Re:[Koha] data import question

baljkas＠mb.sympatico.ca

Larry Stamm

tags

participants (2)