Koha 2.2.9, Unicode (UTF-8), Latin-1 (ISO-8859-1) and migration to Koha 3

22 Apr 2008

      Hi list,

I have a Koha 2.2.9 system running on a machine with SLES 9 (SUSE
Linux Enterpise Server) with SP3 (Service Pack 3), running Apache
2.2.0 and MySQL 4.0.18

I'm now considering migrating it to another machine that will run Koha
3 Beta 2, on a server with SLES 10 with SP1 (Service Pack 1), running
Apache 2.2.4 and MySQL 5.0.26

I have done a mysqldump of the koha database in the Koha 2.2.9 system.
Unfortunately, I found out that the dump has mixed character
encodings, namely that some characters are in Unicode ("UTF-8") and
others are in Latin-1 ("ISO 8859-1" and/or "ISO-8859-15").

I am Portuguese (living in Portugal), so the "problematic" characters
are the Portuguese accented characters ("ã" - a tilde; "ç" - c
cedilla; "é" - e acute; and other characters with accents).

This leads to my first question:

1 - Should a Koha 2.2.9 system be preferably set up for Unicode
("UTF-8") or Latin-1 ("ISO-8859-1" / "ISO 8859-15")?

By reading the following page:

Encoding and Character Sets in Koha
http://wiki.koha.org/doku.php?id=encodingscratchpad

... and namely, the first version of that page -
http://wiki.koha.org/doku.php?id=encodingscratchpad&rev=1152103445 -
it seems that for versions of Koha >= 2.2.6, I should set up the
"locale", Apache and MySQL for Unicode ("UTF-8").

Is this correct?

My next question is this one:

2 - What is the "best" way to convert this "mixed" mysqldump (UTF-8 /
ISO-8859-1) file to a "pure" UTF-8 one (or to a "pure" Latin-1 one)?

I have already found out these pages, but would appreciate feedback
from fellow Koha users that already had this problem:

How to sanitize a string with mixed encodings - UTF-8 and Latin1
http://www.fischerlaender.net/php/sanitize-utf8-latin1

Encoding issues MySql Latin / UTF-8
http://www.vlugge.eu/blog/algemeen/encoding-issues-mysql-latin-utf-8/

Mixed ISO-8859/UTF-8 conversion
http://www.perlmonks.org/?node_id=642617

And now, the "main question":

3 - Is the best migration strategy, the following sequence:

3.1. -  "Transform" the mysqldump to a "pure" UTF-8 file

3.2. - Install Koha 2.2.9 on the "second" machine (running SLES 10)

3.3 - Import the mysqldump on the "second" machine

3.4 -  Install Koha 3 Beta 2 on the "second" machine

3.5 - Follow the steps described at:

Upgrading from Koha 2.2 to Koha 3.0
http://wiki.koha.org/doku.php?id=22_to_30

... or is there an easier / better way to do this?

Thanks for taking the time to read this!

ANY help / information / feedback would be much appreciated!  :)

Best wishes,
Ricardo Dias Marques
lists AT ricmarques DOT net

Ricardo Dias Marques

Galen Charlton

Ricardo Dias Marques

Galen Charlton

Ricardo Dias Marques

tags

participants (2)