[Koha] losing data during import
koha-main at toykeeper.net
Thu Aug 5 13:41:18 NZST 2004
I'm having some difficulty keeping data intact when I import with
the bulkmarcimport.pl script. Specifically, it seems that fields
are getting the last 5 bytes chopped off. It seems to be related
to character encodings, but I'm not really sure what to do about
it. Converting from utf-8 to iso8859-1 seems to change the
results, but not correct the problem. Manually replacing all
non-ascii characters with safer equivalents seems to cure the
problem, but is not feasible for the amount of data I have.
I have a data sample which exhibits this problem; it is a
collection of 15 Douglas Adams books:
It was generated from:
My conversion process goes from custom data to MODS, then MODS to
MARC (xml) using the LoC stylesheets for doing so. It then
converts to binary MARC using perl's MARC::Record and
MARC::File::XML. Somewhere in the bulkmarcimport.pl script, data
is getting lost. It's either MARC::Record failing to read its
own files, or in Koha's code somewhere, but I don't know where.
Any hints? I'm hoping I can simply sidestep the conversion
to/from binary marc, to avoid the problem; I'll let people know
if this is effective.
More information about the Koha