Re: [Koha] non-latin script / unicode problem

11 Jan 2011

      Le 11/01/2011 10:42, Irakli Garibashvili a écrit :
...
Hi!
Hi Irakli,
welcome on the list.
Happy New Year.
If I remember correctly, we met in Yerevan some years ago.
...
We are using Koha 3.0 (Marc21, Nozebra) and it looks like fields for items 
have some problems with Georgian script:
I thought that you would be in UNIMARC rather than MARC21. My illusions
about the use of UNIMARC outside France took a bad stroke :D
...
I am not sure what is the reason for this problem but
data entry (in Georgian script - in UTF-8) for the fields like CallNumb, 
copynumber, (notes??),.. in some cases result in corrupted text.
It is hard to know precisely, problem could be that your biblio record
doesnot have a correct leader. All leaders should have "a" for position 9.

So that it is double encoded by MARC::Record when decoded.
And I also guess that somehow encoding is better handled in 3.2 (with
some data normalization)

My 2 cents.
...
I have found that some records in nozebra index  and XML also contain 
similar corrupted texts.
I am not quite sure how to explain what exactly is corrupted in this text, 
but it looks like I see each bit of UTF-8 separately, which must be an 
indication, that character conversion goes wrong for these fields/tables.
I have checked MySQL structure - all appropriate fields have UTF8_general 
collation... (so this must be correct)
Problem with perl codes? Where?

...
Could someone help me?
Thanks in advance,
Irakli
-- 
Henri-Damien LAURENT
BibLibre

Re: [Koha] non-latin script / unicode problem

LAURENT Henri-Damien