[Koha] How to make the Koha/Zebra search ignore hyphens?
dcook at prosentient.com.au
dcook at prosentient.com.au
Thu Sep 19 13:29:45 NZST 2019
That's really interesting. I assume that you're using ICU indexing?
You could update "phrases-icu.xml" and "words-icu.xml" to strip out hyphens. You would need to re-index all your records afterwards though.
I haven't actually tested that particular change, but just taking a little look with both ICU and CHR and it looks like hyphens are used to tokenize. Currently, when you search "Tee-Ei", you're actually searching for "Tee" and "Ei".
If you're using ICU, you could add a transform rule before the tokenize rule to remove the hyphen. This would prevent it from tokenizing and then "Tee-Ei" and "Teeei" should retrieve the same records.
Beware also that this is a universal change. You might want to check to see if there are hyphens that shouldn't be removed. If so, you may need to make a more complex rule to try to just capture the desired cases.
If you're using CHR, you can take a look at word-phrase-utf.chr and remove - from the "Breaking characters" section. You may or may not also need to map it. I'm less familiar with CHR indexing.
Anyway, I hope that helps.
72/330 Wattle St
Ultimo, NSW 2007
Office: 02 9212 0899
Direct: 02 8005 0595
Date: Wed, 18 Sep 2019 22:46:15 +0200
To: "Koha : access" <koha at lists.katipo.co.nz>
Subject: [Koha] How to make the Koha/Zebra search ignore hyphens?
Message-ID: <5b63f3b4-76c1-c1f8-f35a-6a33e3b0afa5 at adminkuhn.ch>
Content-Type: text/plain; charset=utf-8; format=flowed
We have found that, at least in German, there are words or combinations
of words that can be written in different ways, and both are correct and
are meaning the same, e. g.
* Ultraschallmessgerät = Ultraschall-Messgerät
* Sintiswing = Sinti-Swing
* Teeei = Tee-Ei
* Haftpflichtversicherungsgesellschaft =
This is a general concept in German, so it makes no sense to add a "used
for/see from:" in the authority data. Anyway, such words can exist
everywhere in the bibliographic record, not only in fields linked to
Now the question: is there a way how to teach Koha (or Zebra) to look
for the second term also when the first term is searched, and vice
versa? Or shorter: Just to ignore the hyphens? Using the standard
configuration Koha will not find the second term if the first one is
searched, and vice bversa.
We would appreciate any hint or tip!
Best wishes: Michael
Geschäftsführer · Diplombibliothekar BBS, Informatiker eidg. Fachausweis
Admin Kuhn GmbH · Pappelstrasse 20 · 4123 Allschwil · Schweiz
T 0041 (0)61 261 55 61 · E mik at adminkuhn.ch · W www.adminkuhn.ch
Subject: Digest Footer
Koha mailing list
Koha at lists.katipo.co.nz
End of Koha Digest, Vol 167, Issue 15
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 484 bytes
Desc: not available
More information about the Koha