[Koha] data import question

Wed Jul 30 01:47:35 NZST 2003

Larry Stamm wrote:

>Derek Dresser <Derek.Dresser at gouldacademy.org> writes:
>
>   
>    > Is there an already defined option to extract the "item" data from
>    > the MARC 852 field (I believe from some responses that I have
>    > received that this is a fairly standard location for the "item"
>    > data).  Any help or additional information would be appreciated.
>    > Perhaps there is a way for me to rewrite my MARC data so that that
>    > information will be read properly?
>
>Hi Derek,
>
>No, you have to define the mappings from the marc fields to the items
>table yourself.  This is done in from the Koha intranet interface, going
>to the Parameters link, and then to the "Links Koha-MARC DB" link, and
>then to the items tab from the drop-down menu.  Then you start mapping
>the koha items table column names to MARC subfields.
>
>Note that _all_ of these mappings have to be in the subfields of the
>same MARC field, so that if you are using the 852 field (as our library
>is) then all the needed items table columns will need to be mapped to
>852 subfields.
>
>The mapping that I am trying at the moment is:
>
>items.itemnumber -> 852 f (arbitrary mapping - no data in this subfield
>in the original MARC record of our old system )
>
>items.multivolumepart -> 852 m
>items.barcode         -> 852 p
>items.dateaccessioned -> 852 8
>items.homebranch      -> 852 a (arbitrary mapping)
>items.price           -> 852 9
>items.itemnotes       -> 852 z
>items.holdingbranch   -> 852 b (arbitrary mapping)
>
>Your mappings might have to be slightly different depending on where
>your current MARC records are storing data.  Since many of our library's
>records were entered by hand, I am having to "fix and reconstitute" the
>MARC database to standardize the entries.  This is not proving to be an
>easy job to catch every error.
>
>The rest of the columns in the items tables are left unmapped for now.
>In addition, you have to go to the "MARC tag structure" link under
>Parameters and change the tab value of all the subfields used in the
>items table to "items(10)", except for the tab value of items.itemnumber
>which should be set to "-1(ignore)" since this value will be
>autoincremented by the bulkmarcimport script and should not read any
>values that might inadvertently be in your MARC records that you are
>trying to import.
>
>Then go back to the main Parameters page, and click the "MARC Check"
>link.  This will check all your MARC-Koha mappings to see if they are
>valid. If there are errors, it will point them out and you need to fix
>them before trying to use the bulkmarcimport script.  Once you get an
>error free message from "MARC-Check", then you should be good to use
>bulkmarcimport.  Remember to use the "-d" option to delete the previous
>entries.
>
>Once you get the biblio data into the items tables, the circulation data
>from your old system can be extracted and added to the table if you
>want.  You will probably have to write your own script to automate this,
>since each library system seems to store its circulation data in a
>lightly different fashion.
>
>I find the bulkmarcimport script to be interminably slow, entering only
>about 6000 items per hour on our 950 MHz CPU server running Mandrake
>9.0.  It is not much slower on my 266 MHZ home machine that I am using
>as a test machine.  Because I have to enter all 15,000 items from our
>collection to really test for errors in my MARC "munging", it means
>another 2 1/2 hr wait to load up all the items again after fixing a
>mistake before testing.  I might get frustrated enough to try my hand at
>writing a faster upload script...
>  
>
6000 items per hour, means less than 2 items/second.
I think the problem comes from your IDE drive and/or your mysql 
configuration.
I agree that the bulkmarcimport script is slow because it does a LOT of 
things (in Biblio.pm). But the main reason of the lack of speed is mySQL 
inserting (a lot in fact).

If anyone has a magic sql statement to improve mySQL insert (maybe 
chapter 10 of mySQL doc could be helpfull) feel free to suggest...
I suggest two ideas :
* drop indexes (except pk) at beginning & rebuild them at end
* lock table at beginning & unlock at end (LOCK TABLES XXX WRITE;)

affected tables could be :
* marc_biblio, marc_subfield_table, marc_word
* biblio, biblioitems, items

Could anyone give a try to those ideas ?

-- 
Paul POULAIN
Consultant indépendant en logiciels libres
responsable francophone de koha (SIGB libre http://www.koha-fr.org)