Re: [Koha] losing data during import
Wednesday, August 4, 2004 23:15 CDT Hi, Scott, When I check what you had posted at http://toykeeper.net/tmp/koha/dna.mrc what I see isn't a valid MARC record. And it's not just the last 5 bytes that are the problem. Usually -- test this out yourself so you'll know you can believe me -- if you open a real MARC record in say, Word Pad, what you'll see is that long string of numbers (actually, it has a name: the Leader, followed by the numbers that constitute the Directory, which construct the matrix of the 'table' of the MARC record) followed by numbers and letters (words) at the end of which are one or more boxes. I don't really know what ASCII code this is: I'm sure you could figure it out. But haven't used and/or reviewed about a dozen ILS, I can promise you, in a real MARC record, that's what's there. You don't have that. I think what may be happening is that you tried to insert HTML code into the MARC format; it may not like that. I am going to try to decompile and recompile what you have with MARCBreaker, MARCEditor and MARCMaker to see what happens. Anon. Steven F. Baljkas library tech at large Koha neophyte Winnipeg, MB, Canada
From: Scott Scriven <koha-main@toykeeper.net> Date: 2004/08/04 Wed PM 08:41:18 CDT To: koha@lists.katipo.co.nz Subject: [Koha] losing data during import
Hello.
I'm having some difficulty keeping data intact when I import with the bulkmarcimport.pl script. Specifically, it seems that fields are getting the last 5 bytes chopped off. It seems to be related to character encodings, but I'm not really sure what to do about it. Converting from utf-8 to iso8859-1 seems to change the results, but not correct the problem. Manually replacing all non-ascii characters with safer equivalents seems to cure the problem, but is not feasible for the amount of data I have.
I have a data sample which exhibits this problem; it is a collection of 15 Douglas Adams books:
http://toykeeper.net/tmp/koha/dna.mrc
It was generated from:
http://toykeeper.net/tmp/koha/dna.marcxml http://toykeeper.net/tmp/koha/dna.mods
My conversion process goes from custom data to MODS, then MODS to MARC (xml) using the LoC stylesheets for doing so. It then converts to binary MARC using perl's MARC::Record and MARC::File::XML. Somewhere in the bulkmarcimport.pl script, data is getting lost. It's either MARC::Record failing to read its own files, or in Koha's code somewhere, but I don't know where.
Any hints? I'm hoping I can simply sidestep the conversion to/from binary marc, to avoid the problem; I'll let people know if this is effective.
-- Scott _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
participants (1)
-
Baljkas Family