[Koha] Another UTF-8 encoding question
Chadwick, John, DCA
john.chadwick at state.nm.us
Thu Jan 15 11:35:26 NZDT 2009
Play time is over and I am now building a couple of servers using Ubuntu 8.04 and the latest git version of Koha. So far the installations have been clean and easy. I have just one more UTF-8 encoding question.
Being in the southwestern U.S. we are trying to deal with some titles that have Spanish character sets to go along with our primarily English character sets. In looking over Koha Wiki entry for encoding and character sets, http://wiki.koha.org/doku.php?id=encodingscratchpad., I have a question about the section on combining characters and collations. The search entry may not necessarily have the special character, but we need to return records with special characters. For those who have had to deal with this, which would you recommend, utf8_unicode_ci or utf8_general_ci for the collation collection?
The word Univerzalitás is a unicode combining form. When you copy/paste it into a text editor or use a keyboard to type it, it is most likely going to be the non-combining form: Univerzalitás. (in the non-combining form, the hex for the accented a is: Hex 0301; for the non-combining form it’s: Hex 61, Hex 00e1).
Non-combining form: http://www.fileformat.info/info/unicode/char/00e1/index.htm
Combining form: http://www.fileformat.info/info/unicode/char/61/index.htm http://www.fileformat.info/info/unicode/char/0301/index.htm
Univerzalitás Univerzalitás
It seems that the utf8_general_ci collation doesn’t support equality for those two forms. However, utf8_unicode_ci seems to work. If you have combining characters in your data, you may want to go with statements like:
ALTER TABLE marc_word MODIFY word VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
and be sure to add init-connect = ‘SET collation_connection = utf8_unicode_ci’ to your my.cnf
Thanks,
John
+----------------------------------------------------------------------------+
John Chadwick, Ed.D. Information Technology Manager
New Mexico State Library
1209 Camino Carlos Rey
Santa Fe, NM 87507
Phone: 505-476-9740 Cell: 505-629-8116 Fax: 505-476-9761
john.chadwick at state.nm.us
http://www.nmstatelibrary.org
Confidentiality Notice: This e-mail, including all attachments is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message. -- This email has been scanned by the Sybari - Antigen Email System.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.katipo.co.nz/pipermail/koha/attachments/20090114/82655376/attachment.htm
More information about the Koha
mailing list