[Koha] Help - problem with Unicode entries located but a patch is needed

Thu Jan 27 02:45:55 NZDT 2011

Hi!

I have been testing this problem and finally got some results, which might 
be interesting/important to users from countries with non-Latin scripts:

We have Koha 3.0. with Marc21 and nozebra.
Entries made in Items fields in Georgian, Russian or even Latin characters 
with diacritics (in UTF8) were corrupted at some moment :-( which was 
leading to corrupted nozebra indexes and finally problems with searching.
As a result many searches on Georgian terms were unsuccessful  - which 
obviously is irritating - OPAC which cannot search records!

I have tested this on KOHA demo sites... all the same!
We have no experience with Zebra or KOha 3.2 yet, but I am sure the same bug 
would corrupt records similarly.

So what we have found is:

when we create biblios with Unicode, everything goes fine,
when we add, modify, delete,... items with Unicode, again everything is 
fine,

but all fields in Items are corrupted when we make changes in biblio!
I.e. correcting spelling in Title or Author fields do not result in any 
problems with "biblios" but all fields like "callnumber", Item Note (public 
or private), etc which contain non-latin characters are corrupted!

Then we examined MySQL fields and found following:

All entries in biblios and items tables are fine, while in biblioitems table 
all fields except MARCXML are fine and all corrupted characters are located 
in MARCXML field, in parts corresponding to 952 MARC subfields!