[Koha] problem importing marc records

Thu Apr 17 23:00:11 NZST 2008

On Tue, Apr 15, 2008 at 7:00 PM, Rick Welykochy <rick at praxis.com.au> wrote:
> Joshua Ferraro wrote:
>
>
> > On Mon, Apr 14, 2008 at 8:46 PM, Rick Welykochy <rick at praxis.com.au>
> wrote:
> >
> > > Joshua Ferraro wrote:
> > >
> > >
> > >
> > > >
> > > > >  On our 2 GHZ Intel duo-core Linux/debian install, it imports about
> > > > >  250 MARC records per minute, FWIW.
> > > > >
> > > > >
> > > >
> > >
> > >
> > > > You must have meant per second, right?
> > > >
> > > >
> > >  Nope. That figure was back of envelop from memory. Here is the
> > >  correct figure. The server was doing nothing else at the time,
> > >  on a Sunday arvo.
> > >
> > >  It took 101 minutes for 32605 records = 322 records per minute.
> > >
> > Hmmm, that seems unusually slow to me, an order of magnitude or
> > so. Can you run the following commands to try to figure out what
> > the bottleneck is:
> >
> > $ perl -I -d:DProf  /path/to/koha/modules bulkmarcimport.pl -file
> > /path/to/file.mrc tmon.out
> > $ dprofpp  -v > dprof.txt
> >
> > Then share the output of dprof.txt with us?
> >
>
>  Too late for that server. It is now in production.
>
>  I might try the same thing on our test box when time permits, with perhaps
>  1000 records and get a profile that way.
>
>  It does seem to be taking a very long time. But consider that the import
>  process is  parsing all the records and also deconstructing them and
>  shoveling them word by word into MySQL. While I was monitoring the
>  processes, MySQL seemd to be the chief task running. I would imagine that
>  the storage of words to marc_word were done one row at a time. This is
>  can slow things down; aggregating the writes to database would be more
>  efficient. Another indicator of a single tazsk dominating is that the
>  second CPU on the box was basically idle.
>
>  We have a script that pre-processes the MARC data, and it also
>  uses the MARC::* classes. The preprocessing involves reading in biblio
>  records (sans items), grabbing the items from a MySQL staging table,
>  and adding them to the MARC records, then outputting a new set of MARC
>  records.
>
>  That task processed all 32605 records in 24 seconds (!)
>
>  Conclusion: there is something seriously inefficient in the bulk MARC
>  importer script.
I think you must be running version 2.2.x. The import script in 3.0 is
very fast,
and if you're experiencing slowness like that in 3.0, we need to know about
it and find out what's happening. If it's 2.2.x, then we already know why :-)
Please let us know.

Cheers,

-- 
Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE
CEO migration, training, maintenance, support
LibLime Featuring Koha Open-Source ILS
jmf at liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS