Re: [Koha] Koha 2.2.9, Unicode (UTF-8), Latin-1 (ISO-8859-1) and migration to Koha 3

24 Apr 2008

      Hi Galen,

On Tue, Apr 22, 2008, Galen Charlton <galen.charlton@liblime.com> wrote:
...
Doing a Latin-1 to UTF-8 conversion on the mysqldump directly will
 likely make any MARC records that are touched unparseable.  I suggest
 as part of your process that you export the MARC bib and authority
 records separately, fix them using MARC::Record and the techniques
 you've already identified, then import them back into your 2.2.9 test
 database.  Then you can fix a mysqldump of the non-MARC tables.
First of all, thank you very much for that important tip!

Could you please point me to any web page that has some Perl code
sample that does what you described, using the MARC::Record module,
meaning Perl code that:

1 - Opens a .mrc file that has MARC bibliographic info (we use UNIMARC
here) for several records

2 - For each record, sees if it's already in UTF-8:
   a) If it is already in UTF-8, then skip it
   b) If it is NOT in UTF-8 (namely because it is in the ISO-8859-1 /
Latin-1 encoding / charset), then convert it to UTF-8

3 - Writes a .mrc file with the pure UTF-8 output.

I have already read the documentation for the MARC::Record module, located at:
http://search.cpan.org/~mikery/MARC-Record-2.0.0/lib/MARC/Record.pm

... but I must admit that I am still a bit confused.  :-/

Thanks again!

Best wishes,
Ricardo Dias Marques
lists AT ricmarques DOT net