problem importing marc records
from staging to managing the staged records to completing the import... my server running nothing but Koha 3.0 (only 2GB partition) it's a test server so that's all that was available atm.. appeared to be running out of space... in /var/log/mysql I noticed: -rw-rw---- 1 mysql adm 1611014 2008-04-07 12:01 mysql-bin.000006 -rw-rw---- 1 mysql adm 105149302 2008-04-07 12:01 mysql-bin.000005 -rw-rw---- 1 mysql adm 672 2008-04-07 12:01 mysql-bin.index -rw-rw---- 1 mysql adm 104955729 2008-04-07 11:57 mysql-bin.000004 -rw-rw---- 1 mysql adm 104889615 2008-04-07 11:52 mysql-bin.000003 -rw-rw---- 1 mysql adm 104987514 2008-04-07 11:47 mysql-bin.000002 -rw-rw---- 1 mysql adm 104906029 2008-04-07 11:42 mysql-bin.000001 honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min... so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday. Any thoughts? --Huck
Hi, On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery. Regards, Galen -- Galen Charlton Koha Application Developer LibLime galen.charlton@liblime.com p: 1-888-564-2457 x709
One other thing... in the 'manage staged records'...there is an option for filtering records...based on matching some rules... but I haven't found where I can setup rules... i.e. no duplicates or some such thing...the only option is: do not look for matching records. And thanks Galen! I'm relieved to hear my assumptions were correct(even if I am going about deleting them a non-canonical method) ;) --Huck Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen
Hi, On Mon, Apr 7, 2008 at 2:38 PM, Huck <dhuckaby@hvja.org> wrote:
One other thing... in the 'manage staged records'...there is an option for filtering records...based on matching some rules... but I haven't found where I can setup rules... i.e. no duplicates or some such thing...the only option is: do not look for matching records.
The matching rule does not filter records from the import file per se, it controls whether an incoming record replaces the bib portion of a matching record found in the database or is just added. The matching criteria are set up in the record matching rules page off of Administration. Regards, Galen -- Galen Charlton Koha Application Developer LibLime galen.charlton@liblime.com p: 1-888-564-2457 x709
Given the context this may be helpful: opac.liblime.com is running with 1.5 million bib records and 2 million items: that's about 3 Gigs of binary MARC data, about 10 Gigs of MySQL space, and about 31 Gigs of Zebra index. The server it's running on has 2 gigs of RAM. For a production system of this size, I'd at least double the RAM, and have separate physical disks for mysql, zebra registers, zebra shadow files, and the filesystem itself. Cheers, Josh On Mon, Apr 7, 2008 at 3:37 PM, Galen Charlton <galen.charlton@liblime.com> wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen -- Galen Charlton Koha Application Developer LibLime galen.charlton@liblime.com p: 1-888-564-2457 x709
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes... www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl this is the only thing running... and seems to die and restart die and restart......has eaten up over 700 process id's since I initially clicked the 'complete import' button. when I initially clicked 'complete import' there was another /usr/bin/perl /usr/share/koha/intranet/cgi-bin/some-import-process-here.pl that was running concurrently with the above pasted process, which is no longer running. anything I can do to debug this...or perhaps run something manually via the command-line? --Huck Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen
a little play-by-play as I attempt to complete import yet again: www-data 7443 11.6 6.7 41792 34668 ? S 11:44 7:22 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8101 21.6 4.5 28700 23236 ? R 12:47 0:02 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8103 35.0 2.4 16568 12468 ? R 12:48 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl so it looks like after 1 hour...it launches another 'manage-marc-import.pl'...then it goes away in the next 10 minutes.. www-data 7443 11.6 6.7 41948 34744 ? S 11:44 7:35 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl at this point in time...the background process disappears on the 'ps aux' output...then it starts up again...in some sort of a loop...reoccuring but there is no more progress on the koha page...and the disk-space usage according to 'df' has not changed... Huck wrote:
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes...
www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
this is the only thing running... and seems to die and restart die and restart......has eaten up over 700 process id's since I initially clicked the 'complete import' button.
when I initially clicked 'complete import' there was another /usr/bin/perl /usr/share/koha/intranet/cgi-bin/some-import-process-here.pl that was running concurrently with the above pasted process, which is no longer running.
anything I can do to debug this...or perhaps run something manually via the command-line?
--Huck
Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Hey Huck, First off, which version of Koha are you running (can you check the version syspref, or the About section of your staff client)? Thanks, Josh On Mon, Apr 14, 2008 at 3:59 PM, Huck <dhuckaby@hvja.org> wrote:
a little play-by-play as I attempt to complete import yet again:
www-data 7443 11.6 6.7 41792 34668 ? S 11:44 7:22 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8101 21.6 4.5 28700 23236 ? R 12:47 0:02 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8103 35.0 2.4 16568 12468 ? R 12:48 0:01
/usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
so it looks like after 1 hour...it launches another 'manage-marc-import.pl'...then it goes away in the next 10 minutes..
www-data 7443 11.6 6.7 41948 34744 ? S 11:44 7:35 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
at this point in time...the background process disappears on the 'ps aux' output...then it starts up again...in some sort of a loop...reoccuring but there is no more progress on the koha page...and the disk-space usage according to 'df' has not changed...
Huck wrote:
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes...
www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
this is the only thing running... and seems to die and restart die and restart......has eaten up over 700 process id's since I initially clicked the 'complete import' button.
when I initially clicked 'complete import' there was another /usr/bin/perl /usr/share/koha/intranet/cgi-bin/some-import-process-here.pl that was running concurrently with the above pasted process, which is no longer running.
anything I can do to debug this...or perhaps run something manually via the command-line?
--Huck
Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
Koha version: 3.00.00.061 it seems to stall at exactly record #3100 every time... --Huck Joshua Ferraro wrote:
Hey Huck,
First off, which version of Koha are you running (can you check the version syspref, or the About section of your staff client)?
Thanks,
Josh
On Mon, Apr 14, 2008 at 3:59 PM, Huck <dhuckaby@hvja.org> wrote:
a little play-by-play as I attempt to complete import yet again:
www-data 7443 11.6 6.7 41792 34668 ? S 11:44 7:22 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8101 21.6 4.5 28700 23236 ? R 12:47 0:02 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8103 35.0 2.4 16568 12468 ? R 12:48 0:01
/usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
so it looks like after 1 hour...it launches another 'manage-marc-import.pl'...then it goes away in the next 10 minutes..
www-data 7443 11.6 6.7 41948 34744 ? S 11:44 7:35 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
at this point in time...the background process disappears on the 'ps aux' output...then it starts up again...in some sort of a loop...reoccuring but there is no more progress on the koha page...and the disk-space usage according to 'df' has not changed...
Huck wrote:
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes...
www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
this is the only thing running... and seems to die and restart die and restart......has eaten up over 700 process id's since I initially clicked the 'complete import' button.
when I initially clicked 'complete import' there was another /usr/bin/perl /usr/share/koha/intranet/cgi-bin/some-import-process-here.pl that was running concurrently with the above pasted process, which is no longer running.
anything I can do to debug this...or perhaps run something manually via the command-line?
--Huck
Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below the highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Could you provide a human readable copy of record #3100 somehow? I know iso2709 files don't make it easy, but it'd help. -- Jesse
#3100 is the last successfully 'imported' record according to the Manage Staged MARC Records page... consequently it is page 124 of some 449 pages(how they are broken up). is there not a command-line method of completing the import...without backgrounding the process? --Huck Jesse wrote:
Could you provide a human readable copy of record #3100 somehow? I know iso2709 files don't make it easy, but it'd help.
-- Jesse ------------------------------------------------------------------------
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
On Mon, Apr 14, 2008 at 01:50:20PM -0700, Huck wrote:
#3100 is the last successfully 'imported' record according to the Manage Staged MARC Records page... consequently it is page 124 of some 449 pages(how they are broken up).
is there not a command-line method of completing the import...without backgrounding the process?
Or maybe you want to import your records completely from command line? Use script bulkmarcimport.pl you can find it under misc/migration tools directory run it like ./bulkmarcimprot.pl -h to see all options And also, you may realy have problems with your marc (ISO2709) file (Jesse mentiond this). For looking into your marc file you can use marcdump tool (coming with one of perl modules needed by Koha). It should be in your path, just run something like: marcdump your_marc_file > dump and you will get file named "dump" with all your data dumped in txt with tags. if you want to check you marc file for errors, marclint will do the job: marclint your_marc_file Marijana --- Marijana Glavica Faculty of Humanities and Social Sciences Libraries I. Lucica 3, 10000 Zagreb, Crotia http://www.knjiznice.ffzg.hr
Marijana Glavica wrote:
Use script bulkmarcimport.pl you can find it under misc/migration tools directory run it like ./bulkmarcimprot.pl -h to see all options
We just imported 33,000 MARC records using this wonderful script. It works a treat. It also has an option to dump the MARC records in text format, which is invaluable during your development phase. Many thanks for this wonderful tool. On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW. cheers rick -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor We like to think of ourselves as the Microsoft of the energy world. -- Kenneth Lay, former CEO of Enron
On Mon, Apr 14, 2008 at 7:59 PM, Rick Welykochy <rick@praxis.com.au> wrote:
Marijana Glavica wrote:
Use script bulkmarcimport.pl you can find it under misc/migration tools directory run it like ./bulkmarcimprot.pl -h to see all options
We just imported 33,000 MARC records using this wonderful script. It works a treat.
It also has an option to dump the MARC records in text format, which is invaluable during your development phase.
Many thanks for this wonderful tool.
On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW. You must have meant per second, right?
Cheers, -- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
Joshua Ferraro wrote:
On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW.
You must have meant per second, right?
Nope. That figure was back of envelop from memory. Here is the correct figure. The server was doing nothing else at the time, on a Sunday arvo. It took 101 minutes for 32605 records = 322 records per minute. cheers rick -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor We like to think of ourselves as the Microsoft of the energy world. -- Kenneth Lay, former CEO of Enron
Joshua Ferraro wrote:
On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW.
You must have meant per second, right?
Nope. That figure was back of envelop from memory. Here is the correct figure. The server was doing nothing else at the time, on a Sunday arvo.
It took 101 minutes for 32605 records = 322 records per minute. Hmmm, that seems unusually slow to me, an order of magnitude or so. Can you run the following commands to try to figure out what
On Mon, Apr 14, 2008 at 8:46 PM, Rick Welykochy <rick@praxis.com.au> wrote: the bottleneck is: $ perl -I -d:DProf /path/to/koha/modules bulkmarcimport.pl -file /path/to/file.mrc tmon.out $ dprofpp -v > dprof.txt Then share the output of dprof.txt with us? Thanks! -- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
Joshua Ferraro wrote:
Joshua Ferraro wrote:
On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW.
You must have meant per second, right?
Nope. That figure was back of envelop from memory. Here is the correct figure. The server was doing nothing else at the time, on a Sunday arvo.
It took 101 minutes for 32605 records = 322 records per minute. Hmmm, that seems unusually slow to me, an order of magnitude or so. Can you run the following commands to try to figure out what
On Mon, Apr 14, 2008 at 8:46 PM, Rick Welykochy <rick@praxis.com.au> wrote: the bottleneck is:
$ perl -I -d:DProf /path/to/koha/modules bulkmarcimport.pl -file /path/to/file.mrc tmon.out $ dprofpp -v > dprof.txt
Then share the output of dprof.txt with us?
Too late for that server. It is now in production. I might try the same thing on our test box when time permits, with perhaps 1000 records and get a profile that way. It does seem to be taking a very long time. But consider that the import process is parsing all the records and also deconstructing them and shoveling them word by word into MySQL. While I was monitoring the processes, MySQL seemd to be the chief task running. I would imagine that the storage of words to marc_word were done one row at a time. This is can slow things down; aggregating the writes to database would be more efficient. Another indicator of a single tazsk dominating is that the second CPU on the box was basically idle. We have a script that pre-processes the MARC data, and it also uses the MARC::* classes. The preprocessing involves reading in biblio records (sans items), grabbing the items from a MySQL staging table, and adding them to the MARC records, then outputting a new set of MARC records. That task processed all 32605 records in 24 seconds (!) Conclusion: there is something seriously inefficient in the bulk MARC importer script. cheers rickw -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor We like to think of ourselves as the Microsoft of the energy world. -- Kenneth Lay, former CEO of Enron
On Tue, Apr 15, 2008 at 7:00 PM, Rick Welykochy <rick@praxis.com.au> wrote:
Joshua Ferraro wrote:
On Mon, Apr 14, 2008 at 8:46 PM, Rick Welykochy <rick@praxis.com.au> wrote:
Joshua Ferraro wrote:
On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW.
You must have meant per second, right?
Nope. That figure was back of envelop from memory. Here is the correct figure. The server was doing nothing else at the time, on a Sunday arvo.
It took 101 minutes for 32605 records = 322 records per minute.
Hmmm, that seems unusually slow to me, an order of magnitude or so. Can you run the following commands to try to figure out what the bottleneck is:
$ perl -I -d:DProf /path/to/koha/modules bulkmarcimport.pl -file /path/to/file.mrc tmon.out $ dprofpp -v > dprof.txt
Then share the output of dprof.txt with us?
Too late for that server. It is now in production.
I might try the same thing on our test box when time permits, with perhaps 1000 records and get a profile that way.
It does seem to be taking a very long time. But consider that the import process is parsing all the records and also deconstructing them and shoveling them word by word into MySQL. While I was monitoring the processes, MySQL seemd to be the chief task running. I would imagine that the storage of words to marc_word were done one row at a time. This is can slow things down; aggregating the writes to database would be more efficient. Another indicator of a single tazsk dominating is that the second CPU on the box was basically idle.
We have a script that pre-processes the MARC data, and it also uses the MARC::* classes. The preprocessing involves reading in biblio records (sans items), grabbing the items from a MySQL staging table, and adding them to the MARC records, then outputting a new set of MARC records.
That task processed all 32605 records in 24 seconds (!)
Conclusion: there is something seriously inefficient in the bulk MARC importer script. I think you must be running version 2.2.x. The import script in 3.0 is very fast, and if you're experiencing slowness like that in 3.0, we need to know about it and find out what's happening. If it's 2.2.x, then we already know why :-) Please let us know.
Cheers, -- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
On our 2 GHZ Intel duo-core Linux/debian install, it imports about 250 MARC records per minute, FWIW.
It took 101 minutes for 32605 records = 322 records per minute.
I think you must be running version 2.2.x. The import script in 3.0 is very fast, and if you're experiencing slowness like that in 3.0, we need to know about it and find out what's happening. If it's 2.2.x, then we already know why :-) Please let us know.
Hi All, Just my 2 cents... I just ran a bulkmarcimport.pl and rebuildzebra.pl on 12,000 bibs, on a koha3, and it took 8 minutes - which is around 1500 records a minute, And that was in a debian-VM on my lappy :) Mason.
Joshua Ferraro wrote:
Conclusion: there is something seriously inefficient in the bulk MARC importer script.
I think you must be running version 2.2.x. The import script in 3.0 is very fast, and if you're experiencing slowness like that in 3.0, we need to know about it and find out what's happening. If it's 2.2.x, then we already know why :-)
Sorry, I shoulda mention that this is on Koha/2.2.9. It would have saved a lot of communications! Glad to know it is a known issue and has been fixed. I did not really present a problem for us. Just ran it overnight :) cheers rickw -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor We like to think of ourselves as the Microsoft of the energy world. -- Kenneth Lay, former CEO of Enron
is there someway I could empty the 'import/staged' tables (command-line SQL statement would be fine)... and attempt from scratch? I can't find anything 'human readable' other than when I do a cataloging search. I have all the time in the world as this is just me doing a favor to upgrade a school for their volunteer librarian who is working on version 2.2.9 --Huck Jesse wrote:
Could you provide a human readable copy of record #3100 somehow? I know iso2709 files don't make it easy, but it'd help.
-- Jesse ------------------------------------------------------------------------
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Hi, On Mon, Apr 14, 2008 at 3:59 PM, Huck <dhuckaby@hvja.org> wrote:
is there someway I could empty the 'import/staged' tables (command-line SQL statement would be fine)... and attempt from scratch? I can't find anything 'human readable' other than when I do a cataloging search.
A "delete from import_batches where import_batch_id = ?" would do it, then as Joshua suggested, upgrading and running the import again. You can use the stage_biblios_file.pl and commit_biblios_file.pl command-line jobs as an alternative to the web version. Regards, Galen -- Galen Charlton Koha Application Developer LibLime galen.charlton@liblime.com p: 1-888-564-2457 x709
On Mon, Apr 14, 2008 at 4:35 PM, Huck <dhuckaby@hvja.org> wrote:
Koha version: 3.00.00.061
it seems to stall at exactly record #3100 every time... Please upgrade to 3.00.00.069, there were some important fixes to encoding problems in MARC::Charset.
Josh
--Huck
Joshua Ferraro wrote:
Hey Huck,
First off, which version of Koha are you running (can you check the version syspref, or the About section of your staff client)?
Thanks,
Josh
On Mon, Apr 14, 2008 at 3:59 PM, Huck <dhuckaby@hvja.org> wrote:
a little play-by-play as I attempt to complete import yet again:
www-data 7443 11.6 6.7 41792 34668 ? S 11:44 7:22 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8101 21.6 4.5 28700 23236 ? R 12:47 0:02 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl www-data 8103 35.0 2.4 16568 12468 ? R 12:48 0:01
/usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
so it looks like after 1 hour...it launches another 'manage-marc-import.pl'...then it goes away in the next 10 minutes..
www-data 7443 11.6 6.7 41948 34744 ? S 11:44 7:35 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
at this point in time...the background process disappears on the 'ps aux' output...then it starts up again...in some sort of a loop...reoccuring but there is no more progress on the koha page...and the disk-space usage according to 'df' has not changed...
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes...
www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
this is the only thing running... and seems to die and restart die and restart......has eaten up over 700 process id's since I initially clicked the 'complete import' button.
when I initially clicked 'complete import' there was another /usr/bin/perl /usr/share/koha/intranet/cgi-bin/some-import-process-here.pl that was running concurrently with the above pasted process, which is no longer running.
anything I can do to debug this...or perhaps run something manually via the command-line?
--Huck
Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
honestly having no clue what they do/did were used for... I assumed(yes we know what that means :) that these were sort of temp/log files of what mysql-bin was doing ...so in essence recording every single transaction or something... and the one with the highest number kept incrementing...and would get to it's size limit it seems every 5 min...
These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
so right now I'm monitoring and deleting each of the ones below
Huck wrote: the
highest numbered(in filename)...attempting to stave-off the 'out of disk space' which was causing this process to 'hang' on Friday.
The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as log_bin and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production server should not be done lightly, as it is an important mechanism to use for database recovery.
Regards,
Galen
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS
am attempting to do this with GIT...as when I DL'd from the link on Koha.org...it was still 3.0.0.tar.gz ... I'm not really sure how to 'UPGRADE' using git...'really sure? no...I have absolutely no clue actually'... I have done a: git clone git://git.koha.org/pub/scm/koha.git kohaclone cd kohaclone git checkout -b HVJA origin .... have no idea what to do next ;)...will begin poking around in the kohaclone directory and see if something pops up =) Joshua Ferraro wrote:
On Mon, Apr 14, 2008 at 4:35 PM, Huck <dhuckaby@hvja.org> wrote:
Koha version: 3.00.00.061
it seems to stall at exactly record #3100 every time...
Please upgrade to 3.00.00.069, there were some important fixes to encoding problems in MARC::Charset.
Josh
--Huck
Joshua Ferraro wrote:
Hey Huck,
First off, which version of Koha are you running (can you check the
version
syspref, or the About section of your staff client)?
Thanks,
Josh
On Mon, Apr 14, 2008 at 3:59 PM, Huck <dhuckaby@hvja.org> wrote:
a little play-by-play as I attempt to complete import yet again:
www-data 7443 11.6 6.7 41792 34668 ? S 11:44 7:22 /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
www-data 8101 21.6 4.5 28700 23236 ? R 12:47 0:02 /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
www-data 8103 35.0 2.4 16568 12468 ? R 12:48 0:01
/usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
so it looks like after 1 hour...it launches another 'manage-marc-import.pl'...then it goes away in the next 10 minutes..
www-data 7443 11.6 6.7 41948 34744 ? S 11:44 7:35 /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
at this point in time...the background process disappears on the 'ps aux' output...then it starts up again...in some sort of a
loop...reoccuring
but there is no more progress on the koha page...and the disk-space usage according to 'df' has not changed...
Huck wrote:
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes...
www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
this is the only thing running... and seems to die and restart die and restart......has eaten up over
700
process id's since I initially clicked the 'complete import' button.
when I initially clicked 'complete import' there was another /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/some-import-process-here.pl
that was running concurrently with the above pasted process, which is
no
longer running.
anything I can do to debug this...or perhaps run something manually
via
the command-line?
--Huck
Galen Charlton wrote:
Hi,
On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote:
> honestly having no clue what they do/did were used for... > I assumed(yes we know what that means :) that these were sort of > temp/log files of what mysql-bin was doing ...so in essence
recording
> every single transaction or something... > and the one with the highest number kept incrementing...and would
get to
> it's size limit it seems every 5 min... > > These are in fact DB log files that MySQL uses to record all transactions, and are meant to be used for backup and recovery. Collectively they're called the MySQL binary log. See http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full details.
> so right now I'm monitoring and deleting each of the ones below
the
> highest numbered(in filename)...attempting to stave-off the 'out
of disk
> space' which was causing this process to 'hang' on Friday. > > The canonical way to delete them is to do a 'reset master' from the mysql prompt. You can also change settings in my.cnf such as
log_bin
and binlog_ignore_db to turn off these logs while you do the MARC imports. Note that turning off the binary log on a production
server
should not be done lightly, as it is an important mechanism to use
for
database recovery.
Regards,
Galen
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
n/m ;) fiddled around in the kohaclone...and figured it was just the same as the unpack'n of tar.gz file..:) groovy! git is kewlio... so upgrade golden...staged records again...was fast staging...now attempting to complete again... 9 minutes in it shows a pace setting 2% completed... this is all via the web-interface fwiw... Huck wrote:
am attempting to do this with GIT...as when I DL'd from the link on Koha.org...it was still 3.0.0.tar.gz ...
I'm not really sure how to 'UPGRADE' using git...'really sure? no...I have absolutely no clue actually'...
I have done a:
git clone git://git.koha.org/pub/scm/koha.git kohaclone cd kohaclone git checkout -b HVJA origin
....
have no idea what to do next ;)...will begin poking around in the kohaclone directory and see if something pops up =)
Joshua Ferraro wrote:
On Mon, Apr 14, 2008 at 4:35 PM, Huck <dhuckaby@hvja.org> wrote:
Koha version: 3.00.00.061
it seems to stall at exactly record #3100 every time...
Please upgrade to 3.00.00.069, there were some important fixes to encoding problems in MARC::Charset.
Josh
--Huck
Joshua Ferraro wrote:
Hey Huck,
First off, which version of Koha are you running (can you check the
version
syspref, or the About section of your staff client)?
Thanks,
Josh
On Mon, Apr 14, 2008 at 3:59 PM, Huck <dhuckaby@hvja.org> wrote:
a little play-by-play as I attempt to complete import yet again:
www-data 7443 11.6 6.7 41792 34668 ? S 11:44 7:22 /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
www-data 8101 21.6 4.5 28700 23236 ? R 12:47 0:02 /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
www-data 8103 35.0 2.4 16568 12468 ? R 12:48 0:01
/usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
so it looks like after 1 hour...it launches another 'manage-marc-import.pl'...then it goes away in the next 10 minutes..
www-data 7443 11.6 6.7 41948 34744 ? S 11:44 7:35 /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/tools/manage-marc-import.pl
at this point in time...the background process disappears on the 'ps aux' output...then it starts up again...in some sort of a
loop...reoccuring
but there is no more progress on the koha page...and the disk-space usage according to 'df' has not changed...
Huck wrote:
Still ongoing problems... the importation seems to stall at 27%... running 'ps aux' to check processes...
www-data 6861 96.5 3.5 22204 18100 ? R 11:16 0:01 /usr/bin/perl /usr/share/koha/intranet/cgi-bin/tools/background-job-progress.pl
this is the only thing running... and seems to die and restart die and restart......has eaten up over
700
process id's since I initially clicked the 'complete import' button.
when I initially clicked 'complete import' there was another /usr/bin/perl
/usr/share/koha/intranet/cgi-bin/some-import-process-here.pl
that was running concurrently with the above pasted process, which is
no
longer running.
anything I can do to debug this...or perhaps run something manually
via
the command-line?
--Huck
Galen Charlton wrote:
> Hi, > > On Mon, Apr 7, 2008 at 2:04 PM, Huck <dhuckaby@hvja.org> wrote: > > >> honestly having no clue what they do/did were used for... >> I assumed(yes we know what that means :) that these were sort of >> temp/log files of what mysql-bin was doing ...so in essence
recording
>> every single transaction or something... >> and the one with the highest number kept incrementing...and would
get to
>> it's size limit it seems every 5 min... >> >> > These are in fact DB log files that MySQL uses to record all > transactions, and are meant to be used for backup and recovery. > Collectively they're called the MySQL binary log. See > http://dev.mysql.com/doc/refman/5.0/en/binary-log.html for the full > details. > > > >> so right now I'm monitoring and deleting each of the ones below
the
>> highest numbered(in filename)...attempting to stave-off the 'out
of disk
>> space' which was causing this process to 'hang' on Friday. >> >> > The canonical way to delete them is to do a 'reset master' from the > mysql prompt. You can also change settings in my.cnf such as
log_bin
> and binlog_ignore_db to turn off these logs while you do the MARC > imports. Note that turning off the binary log on a production
server
> should not be done lightly, as it is an important mechanism to use
for
> database recovery. > > Regards, > > Galen > > _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
participants (7)
-
Galen Charlton -
Huck -
Jesse -
Joshua Ferraro -
Marijana Glavica -
Mason James -
Rick Welykochy