[Koha] problem importing marc records

Rick Welykochy rick at praxis.com.au
Wed Apr 16 11:00:06 NZST 2008


Joshua Ferraro wrote:

> On Mon, Apr 14, 2008 at 8:46 PM, Rick Welykochy <rick at praxis.com.au> wrote:
>> Joshua Ferraro wrote:
>>
>>
>>>>  On our 2 GHZ Intel duo-core Linux/debian install, it imports about
>>>>  250 MARC records per minute, FWIW.
>>>>
>>
>>> You must have meant per second, right?
>>>
>>  Nope. That figure was back of envelop from memory. Here is the
>>  correct figure. The server was doing nothing else at the time,
>>  on a Sunday arvo.
>>
>>  It took 101 minutes for 32605 records = 322 records per minute.
> Hmmm, that seems unusually slow to me, an order of magnitude or
> so. Can you run the following commands to try to figure out what
> the bottleneck is:
> 
> $ perl -I -d:DProf  /path/to/koha/modules bulkmarcimport.pl -file
> /path/to/file.mrc tmon.out
> $ dprofpp  -v > dprof.txt
> 
> Then share the output of dprof.txt with us?

Too late for that server. It is now in production.

I might try the same thing on our test box when time permits, with perhaps
1000 records and get a profile that way.

It does seem to be taking a very long time. But consider that the import
process is  parsing all the records and also deconstructing them and
shoveling them word by word into MySQL. While I was monitoring the
processes, MySQL seemd to be the chief task running. I would imagine that
the storage of words to marc_word were done one row at a time. This is
can slow things down; aggregating the writes to database would be more
efficient. Another indicator of a single tazsk dominating is that the
second CPU on the box was basically idle.

We have a script that pre-processes the MARC data, and it also
uses the MARC::* classes. The preprocessing involves reading in biblio
records (sans items), grabbing the items from a MySQL staging table,
and adding them to the MARC records, then outputting a new set of MARC
records.

That task processed all 32605 records in 24 seconds (!)

Conclusion: there is something seriously inefficient in the bulk MARC
importer script.


cheers
rickw


-- 
________________________________________________________________
Rick Welykochy || Praxis Services || Internet Driving Instructor

We like to think of ourselves as the Microsoft of the energy world.
      -- Kenneth Lay, former CEO of Enron


More information about the Koha mailing list