[Koha] Charactersets and moving from 2.2.9 to 3.2.x

Magnus Enger magnus at enger.priv.no
Fri Dec 31 07:36:09 NZDT 2010


Dear all,

I'm at my wit's end here...

I'm trying to move some records from a 2.2.9 install to a 3.2.x
install. Yep, just the records, so i exported them from 2.2.9 and have
them in a file - I'm not trying to convert/upgrade the whole
database/installation.

Now, I think the main problem is that a number of the records have
characters in them that "look strange", like this:
aus der wirtschaftlichen Abhñgigkeit von Militär und Rüstung
How it got to be like that I don't know...

Now when I try to run bulkmarcimport.pl in verbose mode I get lots of this:

.....................Bad MARC record 94: utf8 "\xE4" does not map to
Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 95.
 skipped
Bad MARC record 95: utf8 "\xFC" does not map to Unicode at
/usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 96.
 skipped
.Bad MARC record 97: utf8 "\xFC" does not map to Unicode at
/usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 98.
 skipped
.Bad MARC record 99: utf8 "\xFC" does not map to Unicode at
/usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 100.
 skipped
.Bad MARC record 101: utf8 "\xE4" does not map to Unicode at
/usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 102.
 skipped
.........Bad MARC record 111: utf8 "\xE9" does not map to Unicode at
/usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 112.
 skipped

I tried running the file I exported from 2.2.9 through iconv to
convert it to UTF-8, but of course that changes the length of some
fields, resulting in "clipped" fields.

I tried creating a script to parse the records and walk through every
field and subfield, convert the subfields to UTF-8 and re-assemble the
records, but this seems to only result in errors like the ones above,
e.g.:
utf8 "\xC3" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174.

If anyone has any tips on what to do in a situation like this I would
be forever grateful!

Best regards,
Magnus Enger
libriotech.no


More information about the Koha mailing list