[Koha] Koha 2.2.9, Unicode (UTF-8), Latin-1 (ISO-8859-1) and migration to Koha 3

Galen Charlton galen.charlton at liblime.com
Fri Apr 25 07:45:58 NZST 2008


Hi,

On Thu, Apr 24, 2008 at 2:24 PM, Ricardo Dias Marques
<lists at ricmarques.net> wrote:
>  Could you please point me to any web page that has some Perl code
>  sample that does what you described, using the MARC::Record module,
>  meaning Perl code that:
>
>  1 - Opens a .mrc file that has MARC bibliographic info (we use UNIMARC
>  here) for several records
>
>  2 - For each record, sees if it's already in UTF-8:
>    a) If it is already in UTF-8, then skip it
>    b) If it is NOT in UTF-8 (namely because it is in the ISO-8859-1 /
>  Latin-1 encoding / charset), then convert it to UTF-8
>
>  3 - Writes a .mrc file with the pure UTF-8 output.

Very briefly, Koha 3's C4::Charset module's MarcToUTF8Record routine
should give you some ideas.  You can use that as the core of a routine
to convert a file that contains mixed Latin-1 and UTF-8 records to
UTF-8.  However, it will not correctly handle a MARC record that has
*both* Latin-1 and UTF-8, but could be modified to test each field and
subfield to see if it contains UTF-8 or Latin-1.

Regards,

Galen
-- 
Galen Charlton
Koha Application Developer
LibLime
galen.charlton at liblime.com
p: 1-888-564-2457 x709


More information about the Koha mailing list