[Koha] non-latin script / unicode problem

Tue Jan 11 22:53:05 NZDT 2011

Le 11/01/2011 10:42, Irakli Garibashvili a écrit :
> Hi!
Hi Irakli,
welcome on the list.
Happy New Year.
If I remember correctly, we met in Yerevan some years ago.

> 
> We are using Koha 3.0 (Marc21, Nozebra) and it looks like fields for items 
> have some problems with Georgian script:
I thought that you would be in UNIMARC rather than MARC21. My illusions
about the use of UNIMARC outside France took a bad stroke :D

> 
> I am not sure what is the reason for this problem but
> 
> data entry (in Georgian script - in UTF-8) for the fields like CallNumb, 
> copynumber, (notes??),.. in some cases result in corrupted text.
It is hard to know precisely, problem could be that your biblio record
doesnot have a correct leader. All leaders should have "a" for position 9.

So that it is double encoded by MARC::Record when decoded.
And I also guess that somehow encoding is better handled in 3.2 (with
some data normalization)

My 2 cents.

> 
> I have found that some records in nozebra index  and XML also contain 
> similar corrupted texts.
> 
> I am not quite sure how to explain what exactly is corrupted in this text, 
> but it looks like I see each bit of UTF-8 separately, which must be an 
> indication, that character conversion goes wrong for these fields/tables.
> 
> I have checked MySQL structure - all appropriate fields have UTF8_general 
> collation... (so this must be correct)
> 
> Problem with perl codes? Where?

> 
> Could someone help me?
> 
> 
> Thanks in advance,
> Irakli
-- 
Henri-Damien LAURENT
BibLibre