Hi Galen, On Tue, Apr 22, 2008, Galen Charlton <galen.charlton@liblime.com> wrote:
Doing a Latin-1 to UTF-8 conversion on the mysqldump directly will likely make any MARC records that are touched unparseable. I suggest as part of your process that you export the MARC bib and authority records separately, fix them using MARC::Record and the techniques you've already identified, then import them back into your 2.2.9 test database. Then you can fix a mysqldump of the non-MARC tables.
First of all, thank you very much for that important tip! Could you please point me to any web page that has some Perl code sample that does what you described, using the MARC::Record module, meaning Perl code that: 1 - Opens a .mrc file that has MARC bibliographic info (we use UNIMARC here) for several records 2 - For each record, sees if it's already in UTF-8: a) If it is already in UTF-8, then skip it b) If it is NOT in UTF-8 (namely because it is in the ISO-8859-1 / Latin-1 encoding / charset), then convert it to UTF-8 3 - Writes a .mrc file with the pure UTF-8 output. I have already read the documentation for the MARC::Record module, located at: http://search.cpan.org/~mikery/MARC-Record-2.0.0/lib/MARC/Record.pm ... but I must admit that I am still a bit confused. :-/ Thanks again! Best wishes, Ricardo Dias Marques lists AT ricmarques DOT net