[Koha] Charactersets and moving from 2.2.9 to 3.2.x

Fri Dec 31 07:43:17 NZDT 2010

Magnus,

You probably tried this already, but what about using MarcEdit to convert
the records from MARC-8 to UTF-8? \xFC sounds like a MARC-8 accent if I've
ever seen one...

Regards,
Jared

On Thu, Dec 30, 2010 at 1:36 PM, Magnus Enger <magnus at enger.priv.no> wrote:

> Dear all,
>
> I'm at my wit's end here...
>
> I'm trying to move some records from a 2.2.9 install to a 3.2.x
> install. Yep, just the records, so i exported them from 2.2.9 and have
> them in a file - I'm not trying to convert/upgrade the whole
> database/installation.
>
> Now, I think the main problem is that a number of the records have
> characters in them that "look strange", like this:
> aus der wirtschaftlichen AbhÃ±gigkeit von Militär und Rüstung
> How it got to be like that I don't know...
>
> Now when I try to run bulkmarcimport.pl in verbose mode I get lots of
> this:
>
> .....................Bad MARC record 94: utf8 "\xE4" does not map to
> Unicode at /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 95.
>  skipped
> Bad MARC record 95: utf8 "\xFC" does not map to Unicode at
> /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 96.
>  skipped
> .Bad MARC record 97: utf8 "\xFC" does not map to Unicode at
> /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 98.
>  skipped
> .Bad MARC record 99: utf8 "\xFC" does not map to Unicode at
> /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 100.
>  skipped
> .Bad MARC record 101: utf8 "\xE4" does not map to Unicode at
> /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 102.
>  skipped
> .........Bad MARC record 111: utf8 "\xE9" does not map to Unicode at
> /usr/lib/perl/5.10/Encode.pm line 174, <GEN11> line 112.
>  skipped
>
> I tried running the file I exported from 2.2.9 through iconv to
> convert it to UTF-8, but of course that changes the length of some
> fields, resulting in "clipped" fields.
>
> I tried creating a script to parse the records and walk through every
> field and subfield, convert the subfields to UTF-8 and re-assemble the
> records, but this seems to only result in errors like the ones above,
> e.g.:
> utf8 "\xC3" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line
> 174.
>
> If anyone has any tips on what to do in a situation like this I would
> be forever grateful!
>
> Best regards,
> Magnus Enger
> libriotech.no
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> http://lists.katipo.co.nz/mailman/listinfo/koha
>

-- 
Jared Camins-Esakov
Freelance bibliographer, C & P Bibliography Services, LLC
(phone) +1 (917) 727-3445
(e-mail) jcamins at cpbibliography.com
(web) http://www.cpbibliography.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.katipo.co.nz/pipermail/koha/attachments/20101230/c6f80885/attachment.htm