[Koha] Koha 2.2.9, Unicode (UTF-8), Latin-1 (ISO-8859-1) and migration to Koha 3

Ricardo Dias Marques lists at ricmarques.net
Fri Apr 25 07:24:37 NZST 2008


Hi Galen,

On Tue, Apr 22, 2008, Galen Charlton <galen.charlton at liblime.com> wrote:

>  Doing a Latin-1 to UTF-8 conversion on the mysqldump directly will
>  likely make any MARC records that are touched unparseable.  I suggest
>  as part of your process that you export the MARC bib and authority
>  records separately, fix them using MARC::Record and the techniques
>  you've already identified, then import them back into your 2.2.9 test
>  database.  Then you can fix a mysqldump of the non-MARC tables.

First of all, thank you very much for that important tip!

Could you please point me to any web page that has some Perl code
sample that does what you described, using the MARC::Record module,
meaning Perl code that:

1 - Opens a .mrc file that has MARC bibliographic info (we use UNIMARC
here) for several records

2 - For each record, sees if it's already in UTF-8:
   a) If it is already in UTF-8, then skip it
   b) If it is NOT in UTF-8 (namely because it is in the ISO-8859-1 /
Latin-1 encoding / charset), then convert it to UTF-8

3 - Writes a .mrc file with the pure UTF-8 output.


I have already read the documentation for the MARC::Record module, located at:
http://search.cpan.org/~mikery/MARC-Record-2.0.0/lib/MARC/Record.pm

... but I must admit that I am still a bit confused.  :-/


Thanks again!

Best wishes,
Ricardo Dias Marques
lists AT ricmarques DOT net


More information about the Koha mailing list