Charactersets and moving from 2.2.9 to 3.2.x
Dear all, I'm at my wit's end here... I'm trying to move some records from a 2.2.9 install to a 3.2.x install. Yep, just the records, so i exported them from 2.2.9 and have them in a file - I'm not trying to convert/upgrade the whole database/installation. Now, I think the main problem is that a number of the records have characters in them that "look strange", like this: aus der wirtschaftlichen Abhñgigkeit von Militär und Rüstung How it got to be like that I don't know... Now when I try to run bulkmarcimport.pl in verbose mode I get lots of this: .....................Bad MARC record 94: utf8 "\xE4" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 95. skipped Bad MARC record 95: utf8 "\xFC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 96. skipped .Bad MARC record 97: utf8 "\xFC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 98. skipped .Bad MARC record 99: utf8 "\xFC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 100. skipped .Bad MARC record 101: utf8 "\xE4" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 102. skipped .........Bad MARC record 111: utf8 "\xE9" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 112. skipped I tried running the file I exported from 2.2.9 through iconv to convert it to UTF-8, but of course that changes the length of some fields, resulting in "clipped" fields. I tried creating a script to parse the records and walk through every field and subfield, convert the subfields to UTF-8 and re-assemble the records, but this seems to only result in errors like the ones above, e.g.: utf8 "\xC3" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174. If anyone has any tips on what to do in a situation like this I would be forever grateful! Best regards, Magnus Enger libriotech.no
Magnus, You probably tried this already, but what about using MarcEdit to convert the records from MARC-8 to UTF-8? \xFC sounds like a MARC-8 accent if I've ever seen one... Regards, Jared On Thu, Dec 30, 2010 at 1:36 PM, Magnus Enger <magnus@enger.priv.no> wrote:
Dear all,
I'm at my wit's end here...
I'm trying to move some records from a 2.2.9 install to a 3.2.x install. Yep, just the records, so i exported them from 2.2.9 and have them in a file - I'm not trying to convert/upgrade the whole database/installation.
Now, I think the main problem is that a number of the records have characters in them that "look strange", like this: aus der wirtschaftlichen Abhñgigkeit von Militär und Rüstung How it got to be like that I don't know...
Now when I try to run bulkmarcimport.pl in verbose mode I get lots of this:
.....................Bad MARC record 94: utf8 "\xE4" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 95. skipped Bad MARC record 95: utf8 "\xFC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 96. skipped .Bad MARC record 97: utf8 "\xFC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 98. skipped .Bad MARC record 99: utf8 "\xFC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 100. skipped .Bad MARC record 101: utf8 "\xE4" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 102. skipped .........Bad MARC record 111: utf8 "\xE9" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 112. skipped
I tried running the file I exported from 2.2.9 through iconv to convert it to UTF-8, but of course that changes the length of some fields, resulting in "clipped" fields.
I tried creating a script to parse the records and walk through every field and subfield, convert the subfields to UTF-8 and re-assemble the records, but this seems to only result in errors like the ones above, e.g.: utf8 "\xC3" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174.
If anyone has any tips on what to do in a situation like this I would be forever grateful!
Best regards, Magnus Enger libriotech.no _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Jared Camins-Esakov Freelance bibliographer, C & P Bibliography Services, LLC (phone) +1 (917) 727-3445 (e-mail) jcamins@cpbibliography.com (web) http://www.cpbibliography.com/
participants (2)
-
Jared Camins-Esakov -
Magnus Enger