wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

Bohdan Šmilauer

18 Jan 2014 18 Jan '14

11:15 p.m.

Hi to all! I'm trying to achieve the correct sorting in the search results in Koha 3.12 installed on Ubuntu 12.04. I followed the instructions on http://wiki.koha-community.org/wiki/Encoding_and_Character_Sets_in_Koha (http://wiki.koha-community.org/wiki/Encoding_and_Character_Sets_in_Koha) Despite this, the sorting of the results of authorities or titles is wrong. You can see it on http://koha.doxos.eu:8080/(http://koha.doxos.eu:8080/) The variable "locale" is set to cs_CZ.UTF-8 and a simple sample sorting program written in Perl works correctly. Does anyone know which programs in Koha control the collation order of resullts on Opac and how to fix it? Thank you very much for the advice. Bohdan Smilauer Librarian of Economic Library Letenska 15 Prague 1 Czechia mail: b.smilauer@post.cz phone +420736120563

Show replies by date

Fabio Tiana

20 Jan 20 Jan

8:58 a.m.

New subject: [Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

Hi there! You can adjust the mapping for sorting purposes in the sort-string-utf.chr file, here's a common line as example: map êèéëÊÈÉË e Make sure to use an encoding-savvy text editor (e.g. vi) and restart Zebra when you're done. Hope this helps, Fabio 2014/1/18 Bohdan Šmilauer <b.smilauer@post.cz>

...

Hi to all! I'm trying to achieve the correct sorting in the search results in Koha 3.12 installed on Ubuntu 12.04. I followed the instructions on

http://wiki.koha-community.org/wiki/Encoding_and_Character_Sets_in_Koha (http://wiki.koha-community.org/wiki/Encoding_and_Character_Sets_in_Koha)

Despite this, the sorting of the results of authorities or titles is wrong. You can see it on http://koha.doxos.eu:8080/(http://koha.doxos.eu:8080/) The variable "locale" is set to cs_CZ.UTF-8 and a simple sample sorting program written in Perl works correctly.

Does anyone know which programs in Koha control the collation order of resullts on Opac and how to fix it? Thank you very much for the advice.

Bohdan Smilauer

Librarian of Economic Library

Letenska 15

Prague 1

Czechia

mail: b.smilauer@post.cz

phone +420736120563

_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha

Bohdan Šmilauer

11:24 a.m.

New subject: [Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

Hi Fabio, many thanks for your advice. It helped me. I have investigated /etc/koha/ zebradb/zebra-authorities.cfg file,the specific entry # Where are the config files located? profilePath:/etc/koha/zebradb/authorities/etc:/etc/koha/zebradb/etc:/etc/ koha/zebradb/marc_defs/marc21/authorities:/etc/koha/zebradb/lang_defs/en It points to /etc/koha/zebradb/lang_defs/en , where is the file sort-string- utf.chr, which I updated, as you wrote: "map ěêèéëÊÈÉË e", etc. I have found that other syntax can be used "map eěêèéëÊÈÉË", or "map ěêèéëÊÈÉË(e)", what is correct? Then I ran koha-rebuild -zebra -a -v -f. It caused the accented letters are assumed to be all the same "e" and the accent is ignored in collation. But grammatically correct is, that the "e" precedes "é", "ě",..... How can I control this succession? You can see the result on http://koha.doxos.eu:8080, Authority search, "Submit" (leave the other fields empty) . I want to replace in the Koha missing "Browse authors" by the Authority search. The number of authors is typically more then one thousand. I noticed the authors with No. more then 1000, are not sorted, despite I added "sortmax 11000" at the end of etc/koha/zebradb/zebra-authorities.cfg file. Selecting authors starting with e.g. letter "H" is not simple task, you have to skip many screens, there is no direct jump to the letter "H" or to page e.g. 345. Have you some experience with this problem? Many thanks Bohdan Smilauer Librarian of Economic Library Letenska 15 Prague 1 Czechia mail: b.smilauer@post.cz phone +420736120563 ---------- Původní zpráva ---------- Od: Fabio Tiana <fabio.tian@gmail.com> Datum: 20. 1. 2014 Předmět: Re: [Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...) "Hi there! You can adjust the mapping for sorting purposes in the sort-string-utf.chr file, here's a common line as example: map êèéëÊÈÉË e Make sure to use an encoding-savvy text editor (e.g. vi) and restart Zebra when you're done. Hope this helps, Fabio 2014/1/18 Bohdan Šmilauer <b.smilauer@post.cz>

...

Hi to all! I'm trying to achieve the correct sorting in the search results in Koha 3.12 installed on Ubuntu 12.04. I followed the instructions on

http://wiki.koha-community.org/wiki/Encoding_and_Character_Sets_in_Koha (http://wiki.koha-community.org/wiki/Encoding_and_Character_Sets_in_Koha)

Despite this, the sorting of the results of authorities or titles is wrong. You can see it on http://koha.doxos.eu:8080/(http://koha.doxos.eu:8080/) The variable "locale" is set to cs_CZ.UTF-8 and a simple sample sorting program written in Perl works correctly.

Does anyone know which programs in Koha control the collation order of resullts on Opac and how to fix it? Thank you very much for the advice.

Bohdan Smilauer

Librarian of Economic Library

Letenska 15

Prague 1

Czechia

mail: b.smilauer@post.cz

phone +420736120563

_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha

_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha"

Zeno Tajoli

12:19 p.m.

New subject: wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

Hi to all, Il 20/01/2014 11:24, Bohdan Šmilauer ha scritto:

...

"map ěêèéëÊÈÉË e", etc. I have found that other syntax can be used "map eěêèéëÊÈÉË", or "map ěêèéëÊÈÉË(e)", what is correct? Then I ran koha-rebuild -zebra -a -v -f.

as I know the correct is "map ěêèéëÊÈÉË e"

...

It caused the accented letters are assumed to be all the same "e" and the accent is ignored in collation.

And in fact , this correct. With "map ěêèéëÊÈÉË e" you optain this result. So your request is more complex. See the others dir parallel with /etc/koha/zebradb/lang_defs/en for example /etc/koha/zebradb/lang_defs/uk [Ukraine lang] /etc/koha/zebradb/lang_defs/nb [Norwegian lang] and read http://www.indexdata.com/zebra/doc/character-map-files.html , the official help on setup sort-string- utf.chr [not easy] I suggest to read also http://www.indexdata.com/zebra/doc/fields-and-charsets.html and linked pages. Cheers Zeno Tajoli -- Dr. Zeno Tajoli Dipartimento Gestione delle Informazioni e della Conoscenza z.tajoli@cineca.it fax +39 02 2135520 CINECA - Sede operativa di Segrate

Tomas Cohen Arazi

2:13 p.m.

New subject: [Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

On Mon, Jan 20, 2014 at 7:24 AM, Bohdan Šmilauer <b.smilauer@post.cz> wrote:

...

It points to /etc/koha/zebradb/lang_defs/en , where is the file sort-string- utf.chr, which I updated, as you wrote:

"map ěêèéëÊÈÉË e", etc. I have found that other syntax can be used "map eěêèéëÊÈÉË", or "map ěêèéëÊÈÉË(e)", what is correct? Then I ran koha-rebuild -zebra -a -v -f.

It caused the accented letters are assumed to be all the same "e" and the accent is ignored in collation.

That's correct. Mapping all variants of e+diacritics means they all "weight" the same (as e) in an ordering.

...

But grammatically correct is, that the "e" precedes "é", "ě",..... How can I control this succession?

Take the 'es' example (zebradb/lang_defs/es/sort-string-utf.chr) and look for the lines: lowercase {0-9}{a-y}zæøå uppercase {0-9}{A-Y}ZÆØÅ ^^^^^^^^^^ those are the lines you need to adjust. To accomplish your goal you should remove from the mappings those letters with diacritics you want to give a different sorting order (i.e. make them not weight the same as 'e'). The next step is putting them in the lowercase and uppercase lines.inthe (increasing) order. For example: lowercase {0-9}abcdeěêèéëfghijklmnopqrstuvwxyz Regards To+ -- Tomás Cohen Arazi Prosecretaría de Informática Universidad Nacional de Córdoba ✆ +54 351 4333190 ext 13168 GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F

Bohdan Šmilauer

3:36 p.m.

New subject: [Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...)

Hi Tomas! Estupendo! Muchas gracias! Really it does work! See http://koha.doxos.eu: 8080/ The succession (collation) of all letters is determined by the order in the list # basic character set lowercase {0-9}aábcčdďeéěfghiíjklmnňoópqrřsštťuúůvwxyýzž uppercase {0-9}AÁBCČDĎEÉĚFGHIÍJKLMNŇOÓPQRŘSŠTŤUÚŮVWXYÝZŽ stored in active sort-string-utf.chr file. The accented letters must not be included in others maps or equivalence statements. How simple, but how difficult to discover it! I would like to express my thanks to all who helped me. Yours sincerely Bohdan Smilauer Librarian of Economic Library Letenska 15 Prague 1 Czechia mail: b.smilauer@post.cz phone +420736120563 ---------- Původní zpráva ---------- Od: Tomas Cohen Arazi <tomascohen@gmail.com> Datum: 20. 1. 2014 Předmět: Re: [Koha] wrong sorting order of results beginning with accented letters (č, š, ř, ž,...) " On Mon, Jan 20, 2014 at 7:24 AM, Bohdan Šmilauer <b.smilauer@post.cz (mailto:b.smilauer@post.cz)> wrote: " It points to /etc/koha/zebradb/lang_defs/en , where is the file sort-string- utf.chr, which I updated, as you wrote: "map ěêèéëÊÈÉË e", etc. I have found that other syntax can be used "map eěêèéëÊÈÉË", or "map ěêèéëÊÈÉË(e)", what is correct? Then I ran koha-rebuild -zebra -a -v -f. It caused the accented letters are assumed to be all the same "e" and the accent is ignored in collation." That's correct. Mapping all variants of e+diacritics means they all "weight" the same (as e) in an ordering. " But grammatically correct is, that the "e" precedes "é", "ě",..... How can I control this succession?" Take the 'es' example (zebradb/lang_defs/es/sort-string-utf.chr) and look for the lines: lowercase {0-9}{a-y}zæøå uppercase {0-9}{A-Y}ZÆØÅ ^^^^^^^^^^ those are the lines you need to adjust. To accomplish your goal you should remove from the mappings those letters with diacritics you want to give a different sorting order (i.e. make them not weight the same as 'e'). The next step is putting them in the lowercase and uppercase lines.in (http://lines.in) the (increasing) order. For example: lowercase {0-9}abcdeěêèéëfghijklmnopqrstuvwxyz Regards To+ -- Tomás Cohen Arazi Prosecretaría de Informática Universidad Nacional de Córdoba ✆ +54 351 4333190 ext 13168 GPG: B76C 6E7C 2D80 551A C765 E225 0A27 2EA1 B2F3 C15F "

4535

Age (days ago)

4537

Last active (days ago)

List overview

Download

5 comments

4 participants

participants (4)

Bohdan Šmilauer
Fabio Tiana
Tomas Cohen Arazi
Zeno Tajoli