[Koha] Elastic search for Arabic

Mohamad F Barham mbarham at birzeit.edu
Thu Aug 29 18:27:57 NZST 2024


Dears,


I just need to update you regarding elastic search in Arabic,

SOLVED

Solution was so simple, using elastic search built-in arabic analyzer (REF https://www.elastic.co/guide/en/elasticsearch/reference/7.17/analysis-lang-analyzer.html#arabic-analyzer )

Using kibana opened biblio index settings, added

# this will remove the specified words from the stemmer

  "index.analysis.filter.arabic_keywords.keywords": [
    "الله"
  ],
  "index.analysis.filter.arabic_keywords.type": "keyword_marker",

# this for Arabic stemmer filter
  "index.analysis.filter.arabic_stemmer.type": "stemmer",
  "index.analysis.filter.arabic_stemmer.language": "arabic",


-------------

Then add the filters to the current analyzer (order is important)


"index.analysis.analyzer.analyzer_standard.filter": [

    "icu_folding",
    "arabic_keywords",
    "arabic_stemmer"
  ],
----------------
Then reindex from terminal

koha-elasticsearch --rebuild  -b  -c 2000 -p 8 koha




Mohamad Barham

System Engineer | Information Technology Department

Birzeit University

P.O.Box. 14, Birzeit, Palestine

Tel: + 970 22982012 | Mob: +970 597 861929 | Ext: 5616

mbarham at birzeit.edu | www.birzeit.edu<http://www.birzeit.edu/>




________________________________
From: Koha <koha-bounces at lists.katipo.co.nz> on behalf of Fridolin SOMERS <fridolin.somers at biblibre.com>
Sent: Thursday, September 21, 2023 11:14 AM
To: koha at lists.katipo.co.nz <koha at lists.katipo.co.nz>
Subject: Re: [Koha] Elastic search for Arabic

Hi,

I think you can add a new 'char_filter' of type 'pattern_replace' like
'punctuation' :
https://git.koha-community.org/Koha-community/Koha/src/branch/master/admin/searchengine/elasticsearch/index_config.yaml#L35

Maybe also look at 'arabic_normalization' :
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-normalization-tokenfilter.html

I you manage to make it work, you may explain it in this wiki page
(currently for Zebra) :
https://wiki.koha-community.org/wiki/Correcting_Search_of_Arabic_records

Best regards,

Le 20/09/2023 à 02:14, Mohamad F Barham a écrit :
> Hi,
>
> its regarding search ,
>
> elastic search is working well for arabic with analysis-icu
>
> its working fine to consider " ه" as "ة " for example
>
> What I need , when I search for "القدس " get same result as " قدس" , the
> "ال " in arabic is like "the" in english
>
> How can I edit the behaviour of elastic search with analysis-icu to meet
> this needs
>
> Regards
>
>
>
> ____
>
>
>
> *Mohamad Barham__*
>
> System Engineer | Information Technology Department____
>
> Birzeit University____
>
> P.O.Box. 14, Birzeit, Palestine
>
> Tel: + 970 22982012 | Mob: +970 597 861929 | Ext: 5616
>
> mbarham at birzeit.edu | www.birzeit.edu<http://www.birzeit.edu> <http://www.birzeit.edu/>
>
>
>
>
> ------------------------------------------------------------------------
> *From:* Koha <koha-bounces at lists.katipo.co.nz> on behalf of Fridolin
> SOMERS <fridolin.somers at biblibre.com>
> *Sent:* Wednesday, September 20, 2023 11:21 AM
> *To:* koha at lists.katipo.co.nz <koha at lists.katipo.co.nz>
> *Subject:* Re: [Koha] Elastic search for Arabic
> Hi,
>
> Could you tell us more about your needs ?
> Is it for search or sorting ?
>
> Best regards,
>
> Le 17/09/2023 à 22:25, Mohamad F Barham a écrit :
>> Dears,
>>
>>
>> Any one up to help in enhancing elastic search for arabic, specially "ال التعريف " at the beginning of word removal
>>
>>
>>
>> Regards
>>
>>
>>
>>
>>
>> Mohamad Barham
>>
>> System Engineer | Information Technology Department
>>
>> Birzeit University
>>
>> P.O.Box. 14, Birzeit, Palestine
>>
>> Tel: + 970 22982012 | Mob: +970 597 861929 | Ext: 5616
>>
>> mbarham at birzeit.edu | www.birzeit.edu<http://www.birzeit.edu/ <http://www.birzeit.edu/>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>> The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended  recipient you are hereby notified that any disclosure, copying,
> distribution or taking any action in reliance on the contents of this
> information is strictly prohibited and may be unlawful. If you have
> received this communication in error, please notify us immediately by
> responding to this email and then delete it from your system. The
> University is neither liable for the proper and complete transmission of
> the information contained in this communication nor for any delay in its
> receipt.
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>> _______________________________________________
>>
>> Koha mailing list  http://koha-community.org <http://koha-community.org>
>> Koha at lists.katipo.co.nz
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> <https://lists.katipo.co.nz/mailman/listinfo/koha>
>
> --
> Fridolin SOMERS <fridolin.somers at biblibre.com>
> Software and system maintainer 🦄
> BibLibre, France
> _______________________________________________
>
> Koha mailing list http://koha-community.org <http://koha-community.org>
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
> <https://lists.katipo.co.nz/mailman/listinfo/koha>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
> The information contained in this communication is intended solely for
> the use of the individual or entity to whom it is addressed and others
> authorized to receive it. It may contain confidential or legally
> privileged information. If you are not the intended recipient you are
> hereby notified that any disclosure, copying, distribution or taking any
> action in reliance on the contents of this information is strictly
> prohibited and may be unlawful. If you have received this communication
> in error, please notify us immediately by responding to this email and
> then delete it from your system. The University is neither liable for
> the proper and complete transmission of the information contained in
> this communication nor for any delay in its receipt.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~

--
Fridolin SOMERS <fridolin.somers at biblibre.com>
Software and system maintainer 🦄
BibLibre, France
_______________________________________________

Koha mailing list  http://koha-community.org
Koha at lists.katipo.co.nz
Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
~~~~~~~~~~~~~~~~~~~~~~~~~~
The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The University is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt.
~~~~~~~~~~~~~~~~~~~~~~~~~~


More information about the Koha mailing list