[Koha] Another UTF-8 encoding question

Chadwick, John, DCA john.chadwick at state.nm.us
Thu Jan 15 11:35:26 NZDT 2009


Play time is over and I am now building a couple of servers using Ubuntu 8.04 and the latest git version of Koha. So far the installations have been clean and easy. I have just one more UTF-8 encoding question.

 

Being in the southwestern U.S. we are trying to deal with some titles that have Spanish character sets to go along with our primarily English character sets. In looking over Koha Wiki entry for encoding and character sets, http://wiki.koha.org/doku.php?id=encodingscratchpad., I have a question about the section on combining characters and collations. The search entry may not necessarily have the special character, but we need to return records with special characters. For those who have had to deal with this, which would you recommend, utf8_unicode_ci or utf8_general_ci for the collation collection?

 

The word Univerzalitás is a unicode combining form. When you copy/paste it into a text editor or use a keyboard to type it, it is most likely going to be the non-combining form: Univerzalitás. (in the non-combining form, the hex for the accented a is: Hex 0301; for the non-combining form it’s: Hex 61, Hex 00e1). 

Non-combining form: http://www.fileformat.info/info/unicode/char/00e1/index.htm 

Combining form: http://www.fileformat.info/info/unicode/char/61/index.htm http://www.fileformat.info/info/unicode/char/0301/index.htm 

Univerzalitás Univerzalitás 

It seems that the utf8_general_ci collation doesn’t support equality for those two forms. However, utf8_unicode_ci seems to work. If you have combining characters in your data, you may want to go with statements like: 

ALTER TABLE marc_word MODIFY word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci; 

and be sure to add init-connect = ‘SET collation_connection = utf8_unicode_ci’ to your my.cnf 

 

Thanks,

 

John

 

+----------------------------------------------------------------------------+

John Chadwick, Ed.D. Information Technology Manager

New Mexico State Library

1209 Camino Carlos Rey

Santa Fe, NM 87507

Phone: 505-476-9740  Cell: 505-629-8116 Fax: 505-476-9761

john.chadwick at state.nm.us

http://www.nmstatelibrary.org

 



Confidentiality Notice: This e-mail, including all attachments is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message. -- This email has been scanned by the Sybari - Antigen Email System. 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.katipo.co.nz/pipermail/koha/attachments/20090114/82655376/attachment.htm 


More information about the Koha mailing list