[Koha] losing data during import

Thu Aug 5 13:41:18 NZST 2004

Hello.

I'm having some difficulty keeping data intact when I import with
the bulkmarcimport.pl script.  Specifically, it seems that fields
are getting the last 5 bytes chopped off.  It seems to be related
to character encodings, but I'm not really sure what to do about
it.  Converting from utf-8 to iso8859-1 seems to change the
results, but not correct the problem.  Manually replacing all
non-ascii characters with safer equivalents seems to cure the
problem, but is not feasible for the amount of data I have.

I have a data sample which exhibits this problem; it is a
collection of 15 Douglas Adams books:

  http://toykeeper.net/tmp/koha/dna.mrc

It was generated from:

  http://toykeeper.net/tmp/koha/dna.marcxml
  http://toykeeper.net/tmp/koha/dna.mods

My conversion process goes from custom data to MODS, then MODS to
MARC (xml) using the LoC stylesheets for doing so.  It then
converts to binary MARC using perl's MARC::Record and
MARC::File::XML.  Somewhere in the bulkmarcimport.pl script, data
is getting lost.  It's either MARC::Record failing to read its
own files, or in Koha's code somewhere, but I don't know where.

Any hints?  I'm hoping I can simply sidestep the conversion
to/from binary marc, to avoid the problem; I'll let people know
if this is effective.

-- Scott