New subject: losing data during import

5 Aug 2004

      Wednesday, August 4, 2004   23:28 CDT

Hi again, Scott,

I took a look at the records in detail. Sorry, they weren't missing the 2 blocks at the end of each as they seemed to be in Word Pad.

MARCBreaker can't break it down properly, though, so despite first appearances, they still aren't valid MARC. Something is screwing up the Directory.

I think you must be right in that the non-ASCII characters definitely need to be replaced. Is there any other way you can replace the non-ASCII characters first?

If you could send a sample of the same (or other) records in their original format off listserv, I can see if another method might work.

Cheers,
Steven F. Baljkas
library tech at large
Koha neophyte
Winnipeg, MB, Canada

P.S. You really shouldn't use the $g in 100 in the way that you did. That's not what it was intended for.
...
From: Scott Scriven <koha-main@toykeeper.net>
Date: 2004/08/04 Wed PM 08:41:18 CDT
To: koha@lists.katipo.co.nz
Subject: [Koha] losing data during import
Hello.
I'm having some difficulty keeping data intact when I import with
the bulkmarcimport.pl script.  Specifically, it seems that fields
are getting the last 5 bytes chopped off.  It seems to be related
to character encodings, but I'm not really sure what to do about
it.  Converting from utf-8 to iso8859-1 seems to change the
results, but not correct the problem.  Manually replacing all
non-ascii characters with safer equivalents seems to cure the
problem, but is not feasible for the amount of data I have.
I have a data sample which exhibits this problem; it is a
collection of 15 Douglas Adams books:
http://toykeeper.net/tmp/koha/dna.mrc
It was generated from:
http://toykeeper.net/tmp/koha/dna.marcxml
  http://toykeeper.net/tmp/koha/dna.mods
My conversion process goes from custom data to MODS, then MODS to
MARC (xml) using the LoC stylesheets for doing so.  It then
converts to binary MARC using perl's MARC::Record and
MARC::File::XML.  Somewhere in the bulkmarcimport.pl script, data
is getting lost.  It's either MARC::Record failing to read its
own files, or in Koha's code somewhere, but I don't know where.
Any hints?  I'm hoping I can simply sidestep the conversion
to/from binary marc, to avoid the problem; I'll let people know
if this is effective.
-- Scott
_______________________________________________
Koha mailing list
Koha@lists.katipo.co.nz
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: [Koha] losing data during import

Baljkas Family

Scott Scriven

Scott Scriven

tags

participants (2)