[Koha] losing data during import

Baljkas Family baljkas at mts.net
Thu Aug 5 16:23:24 NZST 2004


Wednesday, August 4, 2004   23:15 CDT

Hi, Scott,

When I check what you had posted at

     http://toykeeper.net/tmp/koha/dna.mrc

what I see isn't a valid MARC record. And it's not just the last 5 bytes that are the problem.

Usually -- test this out yourself so you'll know you can believe me -- if you open a real MARC record in say, Word Pad, what you'll see is that long string of numbers (actually, it has a name: the Leader, followed by the numbers that constitute the Directory, which construct the matrix of the 'table' of the MARC record) followed by numbers and letters (words) at the end of which are one or more boxes.

I don't really know what ASCII code this is: I'm sure you could figure it out. But haven't used and/or reviewed about a dozen ILS, I can promise you, in a real MARC record, that's what's there. You don't have that.

I think what may be happening is that you tried to insert HTML code into the MARC format; it may not like that.

I am going to try to decompile and recompile what you have with MARCBreaker, MARCEditor and MARCMaker to see what happens.

Anon.

Steven F. Baljkas
library tech at large
Koha neophyte
Winnipeg, MB, Canada

> 
> From: Scott Scriven <koha-main at toykeeper.net>
> Date: 2004/08/04 Wed PM 08:41:18 CDT
> To: koha at lists.katipo.co.nz
> Subject: [Koha] losing data during import
> 
> Hello.
> 
> I'm having some difficulty keeping data intact when I import with
> the bulkmarcimport.pl script.  Specifically, it seems that fields
> are getting the last 5 bytes chopped off.  It seems to be related
> to character encodings, but I'm not really sure what to do about
> it.  Converting from utf-8 to iso8859-1 seems to change the
> results, but not correct the problem.  Manually replacing all
> non-ascii characters with safer equivalents seems to cure the
> problem, but is not feasible for the amount of data I have.
> 
> I have a data sample which exhibits this problem; it is a
> collection of 15 Douglas Adams books:
> 
>   http://toykeeper.net/tmp/koha/dna.mrc
> 
> It was generated from:
> 
>   http://toykeeper.net/tmp/koha/dna.marcxml
>   http://toykeeper.net/tmp/koha/dna.mods
> 
> My conversion process goes from custom data to MODS, then MODS to
> MARC (xml) using the LoC stylesheets for doing so.  It then
> converts to binary MARC using perl's MARC::Record and
> MARC::File::XML.  Somewhere in the bulkmarcimport.pl script, data
> is getting lost.  It's either MARC::Record failing to read its
> own files, or in Koha's code somewhere, but I don't know where.
> 
> Any hints?  I'm hoping I can simply sidestep the conversion
> to/from binary marc, to avoid the problem; I'll let people know
> if this is effective.
> 
> 
> -- Scott
> _______________________________________________
> Koha mailing list
> Koha at lists.katipo.co.nz
> http://lists.katipo.co.nz/mailman/listinfo/koha
> 




More information about the Koha mailing list