[Koha] problem importing marc records
Rick Welykochy
rick at praxis.com.au
Wed Apr 16 11:00:06 NZST 2008
Joshua Ferraro wrote:
> On Mon, Apr 14, 2008 at 8:46 PM, Rick Welykochy <rick at praxis.com.au> wrote:
>> Joshua Ferraro wrote:
>>
>>
>>>> On our 2 GHZ Intel duo-core Linux/debian install, it imports about
>>>> 250 MARC records per minute, FWIW.
>>>>
>>
>>> You must have meant per second, right?
>>>
>> Nope. That figure was back of envelop from memory. Here is the
>> correct figure. The server was doing nothing else at the time,
>> on a Sunday arvo.
>>
>> It took 101 minutes for 32605 records = 322 records per minute.
> Hmmm, that seems unusually slow to me, an order of magnitude or
> so. Can you run the following commands to try to figure out what
> the bottleneck is:
>
> $ perl -I -d:DProf /path/to/koha/modules bulkmarcimport.pl -file
> /path/to/file.mrc tmon.out
> $ dprofpp -v > dprof.txt
>
> Then share the output of dprof.txt with us?
Too late for that server. It is now in production.
I might try the same thing on our test box when time permits, with perhaps
1000 records and get a profile that way.
It does seem to be taking a very long time. But consider that the import
process is parsing all the records and also deconstructing them and
shoveling them word by word into MySQL. While I was monitoring the
processes, MySQL seemd to be the chief task running. I would imagine that
the storage of words to marc_word were done one row at a time. This is
can slow things down; aggregating the writes to database would be more
efficient. Another indicator of a single tazsk dominating is that the
second CPU on the box was basically idle.
We have a script that pre-processes the MARC data, and it also
uses the MARC::* classes. The preprocessing involves reading in biblio
records (sans items), grabbing the items from a MySQL staging table,
and adding them to the MARC records, then outputting a new set of MARC
records.
That task processed all 32605 records in 24 seconds (!)
Conclusion: there is something seriously inefficient in the bulk MARC
importer script.
cheers
rickw
--
________________________________________________________________
Rick Welykochy || Praxis Services || Internet Driving Instructor
We like to think of ourselves as the Microsoft of the energy world.
-- Kenneth Lay, former CEO of Enron
More information about the Koha
mailing list