Hi, On Thu, Apr 24, 2008 at 2:24 PM, Ricardo Dias Marques <lists@ricmarques.net> wrote:
Could you please point me to any web page that has some Perl code sample that does what you described, using the MARC::Record module, meaning Perl code that:
1 - Opens a .mrc file that has MARC bibliographic info (we use UNIMARC here) for several records
2 - For each record, sees if it's already in UTF-8: a) If it is already in UTF-8, then skip it b) If it is NOT in UTF-8 (namely because it is in the ISO-8859-1 / Latin-1 encoding / charset), then convert it to UTF-8
3 - Writes a .mrc file with the pure UTF-8 output.
Very briefly, Koha 3's C4::Charset module's MarcToUTF8Record routine should give you some ideas. You can use that as the core of a routine to convert a file that contains mixed Latin-1 and UTF-8 records to UTF-8. However, it will not correctly handle a MARC record that has *both* Latin-1 and UTF-8, but could be modified to test each field and subfield to see if it contains UTF-8 or Latin-1. Regards, Galen -- Galen Charlton Koha Application Developer LibLime galen.charlton@liblime.com p: 1-888-564-2457 x709