[Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

Tue Jan 21 03:36:06 NZDT 2014

Hi Tomas!

Estupendo! Muchas gracias! Really it does work! See http://koha.doxos.eu:
8080/
The succession (collation) of all letters is determined by the order in the 
list 
# basic character set
lowercase {0-9}aábcčdďeéěfghiíjklmnňoópqrřsštťuúůvwxyýzž
uppercase {0-9}AÁBCČDĎEÉĚFGHIÍJKLMNŇOÓPQRŘSŠTŤUÚŮVWXYÝZŽ
stored in active sort-string-utf.chr file. The accented letters must not be 
included in others maps or equivalence statements. How simple, but how 
difficult to discover it! I would like to express my thanks to all who 
helped me. 

Yours sincerely
Bohdan Smilauer
Librarian of Economic Library
Letenska 15
Prague 1
Czechia
mail: b.smilauer at post.cz

phone +420736120563 

---------- Původní zpráva ----------
Od: Tomas Cohen Arazi <tomascohen at gmail.com>
Datum: 20. 1. 2014
Předmět: Re: [Koha] wrong sorting order of results beginning with accented 
letters (č, š, ř, ž,...)

"

On Mon, Jan 20, 2014 at 7:24 AM, Bohdan Šmilauer <b.smilauer at post.cz
(mailto:b.smilauer at post.cz)> wrote:
"
It points to /etc/koha/zebradb/lang_defs/en , where is the file sort-string-
utf.chr, which I updated, as you wrote: 

"map ěêèéëÊÈÉË e", etc. I have found that other syntax can be used "map
eěêèéëÊÈÉË", or "map ěêèéëÊÈÉË(e)", what is correct? Then I ran koha-rebuild
-zebra -a -v -f.

It caused the accented letters are assumed to be all the same "e" and the
accent is ignored in collation."

That's correct. Mapping all variants of e+diacritics means they all "weight"
the same (as e) in an ordering.

" But grammatically correct is, that the "e"
precedes "é", "ě",.....   How can I control this succession?"

Take the 'es' example (zebradb/lang_defs/es/sort-string-utf.chr) and look 
for the lines:

lowercase {0-9}{a-y}zæøå

uppercase {0-9}{A-Y}ZÆØÅ

^^^^^^^^^^ those are the lines you need to adjust. To accomplish your goal 
you should remove from the mappings those letters with diacritics you want 
to give a different sorting order (i.e. make them not weight the same as 
'e'). The next step is putting them in the lowercase and uppercase lines.in
(http://lines.in) the (increasing) order.

For example:

lowercase {0-9}abcdeěêèéëfghijklmnopqrstuvwxyz

Regards

To+ 

-- 

Tomás Cohen Arazi

Prosecretaría de Informática

Universidad Nacional de Córdoba

✆ +54 351 4333190 ext 13168

GPG: B76C 6E7C 2D80 551A C765  E225 0A27 2EA1 B2F3 C15F

"