[Koha] Problem: non-latin characters (Arabic, Chinese, etc)

15 Mar 2012

      Hello again everybody,
after some struggle I finally succeeded installing Koha, configuring it 
and working with English records using MARC21.
All under both Ubuntu 10.10 and 11.04.
Now my problem is that my records will be 99% in Arabic.
Thanks to the help of many of you I found this valuable past message 
that outlines the steps for installing Chinese charsets. But this did 
not solve my problem 'cause:
...
For this:
* install ICU (unicode library from IBM)
* activate it by editing default.idx file, and :
comment this line:
#charmap word-phrase-utf.chr
add this line:
icuchain icu.xml
Your icu.xml should look like this:
<icu_chain locale="fr-FR">
<transliterate rule="\'>\ "/>
<transliterate rule="[:Number:] { '-'>  '' "/>
<transform rule="[:Control:] Any-Remove"/>
<tokenize rule="l"/>
<transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
<transform rule="NFD"/>
<transform rule="[:Nonspacing Mark:] Remove"/>
<transform rule="NFC"/>
<display/>
<casemap rule="l"/>
</icu_chain>
1) Install ICU. What debian package provides ICU? Is it this?

i  libicu42                             
4.2.1-3ubuntu0.10.10.1              International Components for Unicode
ii  yaz-icu                              
4.2.25-1indexdata                   ICU utility for the Z39.50 toolkit

or am I missing something, maybe?

2) defatult.idx: is the one under /etc/koha/zebradb/etc?

3) I'm unsure about the meaning AND location of the icu.xml file. Should 
it be placed under /etc/koha/zebradb/etc as well?
What should contain in the case of Arabic? I suppose that the tranform 
rules will not be the same as French... does anybody have a template for 
Arabic?

4) after all these aspects will be fixed, how do I make myself sure that 
everything works fine?

Best!
Stefano

[Koha] Problem: non-latin characters (Arabic, Chinese, etc)

Stefano Barale