[Koha] losing data during import

Baljkas Family baljkas at mts.net
Thu Aug 5 16:38:06 NZST 2004


Wednesday, August 4, 2004   23:28 CDT

Hi again, Scott,

I took a look at the records in detail. Sorry, they weren't missing the 2 blocks at the end of each as they seemed to be in Word Pad.

MARCBreaker can't break it down properly, though, so despite first appearances, they still aren't valid MARC. Something is screwing up the Directory.

I think you must be right in that the non-ASCII characters definitely need to be replaced. Is there any other way you can replace the non-ASCII characters first?

If you could send a sample of the same (or other) records in their original format off listserv, I can see if another method might work.

Cheers,
Steven F. Baljkas
library tech at large
Koha neophyte
Winnipeg, MB, Canada

P.S. You really shouldn't use the $g in 100 in the way that you did. That's not what it was intended for.

> From: Scott Scriven <koha-main at toykeeper.net>
> Date: 2004/08/04 Wed PM 08:41:18 CDT
> To: koha at lists.katipo.co.nz
> Subject: [Koha] losing data during import
> 
> Hello.
> 
> I'm having some difficulty keeping data intact when I import with
> the bulkmarcimport.pl script.  Specifically, it seems that fields
> are getting the last 5 bytes chopped off.  It seems to be related
> to character encodings, but I'm not really sure what to do about
> it.  Converting from utf-8 to iso8859-1 seems to change the
> results, but not correct the problem.  Manually replacing all
> non-ascii characters with safer equivalents seems to cure the
> problem, but is not feasible for the amount of data I have.
> 
> I have a data sample which exhibits this problem; it is a
> collection of 15 Douglas Adams books:
> 
>   http://toykeeper.net/tmp/koha/dna.mrc
> 
> It was generated from:
> 
>   http://toykeeper.net/tmp/koha/dna.marcxml
>   http://toykeeper.net/tmp/koha/dna.mods
> 
> My conversion process goes from custom data to MODS, then MODS to
> MARC (xml) using the LoC stylesheets for doing so.  It then
> converts to binary MARC using perl's MARC::Record and
> MARC::File::XML.  Somewhere in the bulkmarcimport.pl script, data
> is getting lost.  It's either MARC::Record failing to read its
> own files, or in Koha's code somewhere, but I don't know where.
> 
> Any hints?  I'm hoping I can simply sidestep the conversion
> to/from binary marc, to avoid the problem; I'll let people know
> if this is effective.
> 
> 
> -- Scott
> _______________________________________________
> Koha mailing list
> Koha at lists.katipo.co.nz
> http://lists.katipo.co.nz/mailman/listinfo/koha




More information about the Koha mailing list