[Koha] Problem: non-latin characters (Arabic, Chinese, etc)

Fri Mar 16 06:38:17 NZDT 2012

Hello again everybody,
after some struggle I finally succeeded installing Koha, configuring it 
and working with English records using MARC21.
All under both Ubuntu 10.10 and 11.04.
Now my problem is that my records will be 99% in Arabic.
Thanks to the help of many of you I found this valuable past message 
that outlines the steps for installing Chinese charsets. But this did 
not solve my problem 'cause:

> For this:
> * install ICU (unicode library from IBM)
> * activate it by editing default.idx file, and :
> comment this line:
> #charmap word-phrase-utf.chr
> add this line:
> icuchain icu.xml
>
> Your icu.xml should look like this:
> <icu_chain locale="fr-FR">
> <transliterate rule="\'>\ "/>
> <transliterate rule="[:Number:] { '-'>  '' "/>
> <transform rule="[:Control:] Any-Remove"/>
> <tokenize rule="l"/>
> <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/>
> <transform rule="NFD"/>
> <transform rule="[:Nonspacing Mark:] Remove"/>
> <transform rule="NFC"/>
> <display/>
> <casemap rule="l"/>
> </icu_chain>

1) Install ICU. What debian package provides ICU? Is it this?

i  libicu42                             
4.2.1-3ubuntu0.10.10.1              International Components for Unicode
ii  yaz-icu                              
4.2.25-1indexdata                   ICU utility for the Z39.50 toolkit

or am I missing something, maybe?

2) defatult.idx: is the one under /etc/koha/zebradb/etc?

3) I'm unsure about the meaning AND location of the icu.xml file. Should 
it be placed under /etc/koha/zebradb/etc as well?
What should contain in the case of Arabic? I suppose that the tranform 
rules will not be the same as French... does anybody have a template for 
Arabic?

4) after all these aspects will be fixed, how do I make myself sure that 
everything works fine?

Best!
Stefano