Re: [Koha] Searching in Arabic
Hi Jesse: Can you elaborate on what you mean when you say that you're not getting any Arabic results for *title*, *author* or *series*? I look after a library with English, French, and Arabic records, and I'm seeing search results when I do title searches in Arabic. Are the records catalogued in Arabic or is the only Arabic text in the 880? I'm quite experienced with Zebra and somewhat experienced with Arabic in Koha, so hopefully I'll be able to give you a hand. Do you have a public URL that I could look at? I might be able to do some troubleshooting just from the OPAC alone. The cataloguing framework doesn't really have much to do with the indexing... although the structure of the record itself would. That is, if the only Arabic is in the 880 field, you would likely need a more complex indexing configuration to achieve your goal. What version of Koha are you using? Your link to the manual is 3.2... are you on 3.2 or a newer version? Additionally, how did you install Koha? Was it via the Debian packages, the tarball, or Git? Cheers, David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007
-----Original Message----- Date: Tue, 10 Nov 2015 10:41:48 -0500 From: Jesse Lambertson <jlambertson@sqcc.org> To: "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz>, Koha Devel <koha-devel@lists.koha-community.org> Subject: [Koha] Searching in Arabic Message-ID: <CAMV4Y7NNqd=S1Mq7EbYFhQFira=EuchODmNYLY- ec3EkdWgLnA@mail.gmail.com> Content-Type: text/plain; charset=UTF-8
Good morning everyone (if it is morning when you are...)
A while back, we figured out how to search and retrieve results in Arabic (much of our collection is in this language).
I thank everyone who helped us get this done.
But we have been searching "keyword" in Arabic for so long, we just realized that we are NOT getting any Arabic results for *title*, *author* or *series* in Arabic. ICU is turned-on of course.
The default MARC21 fields in Zebra's indexing list are found here: http://manual.koha-community.org/3.2/en/kohasearchindexes.html
Are there any other steps we need to follow besides those listed here http://wiki.koha-community.org/wiki/How_to_add_new_zebra_index to add 880 to the indexing list?
Or maybe there is a step listed somewhere in the wiki that we missed? Does that mean modifying the framework or is the cataloguing framework itself unaffected during this index-list addition?
Thank you in advance for your assistance.
Regards,
Jesse
-- Jesse A Lambertson Librarian Sultan Qaboos Cultural Center <http://www.sqcc.org/>
عالم الانجازات ينحاز دوماً مع المتفائلين
David, Thank you for getting back to me. We are running 3.20 on Debian "Wheezy" (package install) on our own server. I can't give you a public link yet, but regarding the lack of results in Arabic... We are ONLY getting search results in Arabic while in the default search bar (Library Catalog) - which is really just code for "keyword search." But Since ALL our Arabic is in the linked fields (880), if we do any searching of the same titles via title, author or series, in Arabic, we get ZERO results. I think this is because 880 is not in the default indexing list. It seems to me that associated linked 880 fields for title (245), author( 100 and 700) as well as series (490 and 830) should return results searched in Arabic as long as we add 880 to the list. Am I incorrect in that assumption? Thank you for your assistance, Jesse On Tue, Nov 10, 2015 at 5:38 PM, David Cook <dcook@prosentient.com.au> wrote:
Hi Jesse:
Can you elaborate on what you mean when you say that you're not getting any Arabic results for *title*, *author* or *series*? I look after a library with English, French, and Arabic records, and I'm seeing search results when I do title searches in Arabic.
Are the records catalogued in Arabic or is the only Arabic text in the 880?
I'm quite experienced with Zebra and somewhat experienced with Arabic in Koha, so hopefully I'll be able to give you a hand.
Do you have a public URL that I could look at? I might be able to do some troubleshooting just from the OPAC alone.
The cataloguing framework doesn't really have much to do with the indexing... although the structure of the record itself would. That is, if the only Arabic is in the 880 field, you would likely need a more complex indexing configuration to achieve your goal.
What version of Koha are you using? Your link to the manual is 3.2... are you on 3.2 or a newer version? Additionally, how did you install Koha? Was it via the Debian packages, the tarball, or Git?
Cheers,
David Cook Systems Librarian Prosentient Systems 72/330 Wattle St, Ultimo, NSW 2007
-----Original Message----- Date: Tue, 10 Nov 2015 10:41:48 -0500 From: Jesse Lambertson <jlambertson@sqcc.org> To: "koha@lists.katipo.co.nz" <koha@lists.katipo.co.nz>, Koha Devel <koha-devel@lists.koha-community.org> Subject: [Koha] Searching in Arabic Message-ID: <CAMV4Y7NNqd=S1Mq7EbYFhQFira=EuchODmNYLY- ec3EkdWgLnA@mail.gmail.com> Content-Type: text/plain; charset=UTF-8
Good morning everyone (if it is morning when you are...)
A while back, we figured out how to search and retrieve results in Arabic (much of our collection is in this language).
I thank everyone who helped us get this done.
But we have been searching "keyword" in Arabic for so long, we just realized that we are NOT getting any Arabic results for *title*, *author* or *series* in Arabic. ICU is turned-on of course.
The default MARC21 fields in Zebra's indexing list are found here: http://manual.koha-community.org/3.2/en/kohasearchindexes.html
Are there any other steps we need to follow besides those listed here http://wiki.koha-community.org/wiki/How_to_add_new_zebra_index to add 880 to the indexing list?
Or maybe there is a step listed somewhere in the wiki that we missed? Does that mean modifying the framework or is the cataloguing framework itself unaffected during this index-list addition?
Thank you in advance for your assistance.
Regards,
Jesse
-- Jesse A Lambertson Librarian Sultan Qaboos Cultural Center <http://www.sqcc.org/>
عالم الانجازات ينحاز دوماً مع المتفائلين
-- Jesse A Lambertson Librarian Sultan Qaboos Cultural Center <http://www.sqcc.org/> Ph: (202)-677-3967 Ext. 104 jlambertson@sqcc.org عالم الانجازات ينحاز دوماً مع المتفائلين
Hi Jesse, Il 12/11/2015 15:16, Jesse Lambertson ha scritto:
I think this is because 880 is not in the default indexing list. It seems to me that associated linked 880 fields for title (245), author( 100 and 700) as well as series (490 and 830) should return results searched in Arabic as long as we add 880 to the list.
Am I incorrect in that assumption?
No, sorry. You assumption is correct. You need to change the default indexing setup to index your 880 that are probably like: 880 10$6245-01/(3/r$a[Arabic chars] I suggest you to open a bug here: http://bugs.koha-community.org/bugzilla3/ with description of the problem and 10-20 records from your catalogue to do test. Do you want to try to fix your self ? Well, you need to understand how indexing works with Zebra. Start on chapter 13 of the manual: http://translate.koha-community.org/manual/3.20/en/searching.html Read the basic on Zebra documentation: http://www.indexdata.com/zebra/doc/record-model-domxml.html (Tip: configuration language of the files is XSLT, so you need to learn a basic of it). The 3 basic files are: etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl The real working file is biblio-zebra-indexdefs.xsl In its head you can read: <!-- This file has been automatically generated from a Koha index definition file with the stylesheet koha-indexdefs-to-zebra.xsl. Do not manually edit this file,as it may be overwritten. To regenerate, edit the appropriate Koha index definition file (probably something like {biblio,authority}-koha-indexdefs.xml) and run: `xsltproc koha-indexdefs-to-zebra.xsl {biblio,authority}-koha-indexdefs.xml > {biblio,authority}-zebra-indexdefs.xsl` (substituting the appropriate file names). --> You probaly need to backport this fix from master to 3.20: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217 and insert many condition in biblio-koha-indexdefs.xml about 880 with specific values in $6 Not an easy work. Bye Zeno Tajoli -- Zeno Tajoli /Dipartimento Sviluppi Innovativi/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
Hi , This is a good step by steep reference about adding new Marc fields to zebra index with koha . I hope it will help / by Ashraf Brzy . https://docs.google.com/document/d/1rHuoX_AeLkWK_0FEkWhCvpoTCWO7AcNG5355DIZ_... for handling old books there is a similar case to use other fields in indexing you can check this wiki page : http://wiki.koha-community.org/wiki/Koha_and_old_books#Pros_and_cons Regards ... On Thu, Nov 12, 2015 at 11:05 PM, Tajoli Zeno <z.tajoli@cineca.it> wrote:
Hi Jesse,
Il 12/11/2015 15:16, Jesse Lambertson ha scritto:
I think this is because 880 is not in the default indexing list. It seems to me that associated linked 880 fields for title (245), author( 100 and 700) as well as series (490 and 830) should return results searched in Arabic as long as we add 880 to the list.
Am I incorrect in that assumption?
No, sorry. You assumption is correct. You need to change the default indexing setup to index your 880 that are probably like: 880 10$6245-01/(3/r$a[Arabic chars]
I suggest you to open a bug here: http://bugs.koha-community.org/bugzilla3/ with description of the problem and 10-20 records from your catalogue to do test.
Do you want to try to fix your self ? Well, you need to understand how indexing works with Zebra.
Start on chapter 13 of the manual: http://translate.koha-community.org/manual/3.20/en/searching.html
Read the basic on Zebra documentation: http://www.indexdata.com/zebra/doc/record-model-domxml.html (Tip: configuration language of the files is XSLT, so you need to learn a basic of it).
The 3 basic files are: etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
The real working file is biblio-zebra-indexdefs.xsl In its head you can read: <!-- This file has been automatically generated from a Koha index definition file with the stylesheet koha-indexdefs-to-zebra.xsl. Do not manually edit this file,as it may be overwritten. To regenerate, edit the appropriate Koha index definition file (probably something like {biblio,authority}-koha-indexdefs.xml) and run: `xsltproc koha-indexdefs-to-zebra.xsl {biblio,authority}-koha-indexdefs.xml > {biblio,authority}-zebra-indexdefs.xsl` (substituting the appropriate file names). -->
You probaly need to backport this fix from master to 3.20: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217
and insert many condition in biblio-koha-indexdefs.xml about 880 with specific values in $6
Not an easy work.
Bye Zeno Tajoli
-- Zeno Tajoli /Dipartimento Sviluppi Innovativi/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
-- *Karam Qubsi <https://www.linkedin.com/in/kqubsi>* *Shah Alam , Malaysia . *
Shukran Karam. This looks quite promising also. I will look it over now and talk to one of my local IT people. Regards, Jesse On Thu, Nov 12, 2015 at 10:20 AM, Karam Qubsi <karamqubsi@gmail.com> wrote:
Hi , This is a good step by steep reference about adding new Marc fields to zebra index with koha . I hope it will help / by Ashraf Brzy .
https://docs.google.com/document/d/1rHuoX_AeLkWK_0FEkWhCvpoTCWO7AcNG5355DIZ_...
for handling old books there is a similar case to use other fields in indexing you can check this wiki page : http://wiki.koha-community.org/wiki/Koha_and_old_books#Pros_and_cons
Regards ...
On Thu, Nov 12, 2015 at 11:05 PM, Tajoli Zeno <z.tajoli@cineca.it> wrote:
Hi Jesse,
Il 12/11/2015 15:16, Jesse Lambertson ha scritto:
I think this is because 880 is not in the default indexing list. It seems to me that associated linked 880 fields for title (245), author( 100 and 700) as well as series (490 and 830) should return results searched in Arabic as long as we add 880 to the list.
Am I incorrect in that assumption?
No, sorry. You assumption is correct. You need to change the default indexing setup to index your 880 that are probably like: 880 10$6245-01/(3/r$a[Arabic chars]
I suggest you to open a bug here: http://bugs.koha-community.org/bugzilla3/ with description of the problem and 10-20 records from your catalogue to do test.
Do you want to try to fix your self ? Well, you need to understand how indexing works with Zebra.
Start on chapter 13 of the manual: http://translate.koha-community.org/manual/3.20/en/searching.html
Read the basic on Zebra documentation: http://www.indexdata.com/zebra/doc/record-model-domxml.html (Tip: configuration language of the files is XSLT, so you need to learn a basic of it).
The 3 basic files are: etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
The real working file is biblio-zebra-indexdefs.xsl In its head you can read: <!-- This file has been automatically generated from a Koha index definition file with the stylesheet koha-indexdefs-to-zebra.xsl. Do not manually edit this file,as it may be overwritten. To regenerate, edit the appropriate Koha index definition file (probably something like {biblio,authority}-koha-indexdefs.xml) and run: `xsltproc koha-indexdefs-to-zebra.xsl {biblio,authority}-koha-indexdefs.xml > {biblio,authority}-zebra-indexdefs.xsl` (substituting the appropriate file names). -->
You probaly need to backport this fix from master to 3.20: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217
and insert many condition in biblio-koha-indexdefs.xml about 880 with specific values in $6
Not an easy work.
Bye Zeno Tajoli
-- Zeno Tajoli /Dipartimento Sviluppi Innovativi/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
-- *Karam Qubsi <https://www.linkedin.com/in/kqubsi>*
*Shah Alam , Malaysia . *
-- Jesse A Lambertson Librarian Sultan Qaboos Cultural Center <http://www.sqcc.org/> Ph: (202)-677-3967 Ext. 104 jlambertson@sqcc.org عالم الانجازات ينحاز دوماً مع المتفائلين
Ahlan WaSahlan Jesse . Welcome ... On 12 Nov 2015 23:22, "Jesse Lambertson" <jlambertson@sqcc.org> wrote:
Shukran Karam.
This looks quite promising also. I will look it over now and talk to one of my local IT people.
Regards,
Jesse
On Thu, Nov 12, 2015 at 10:20 AM, Karam Qubsi <karamqubsi@gmail.com> wrote:
Hi , This is a good step by steep reference about adding new Marc fields to zebra index with koha . I hope it will help / by Ashraf Brzy .
https://docs.google.com/document/d/1rHuoX_AeLkWK_0FEkWhCvpoTCWO7AcNG5355DIZ_...
for handling old books there is a similar case to use other fields in indexing you can check this wiki page : http://wiki.koha-community.org/wiki/Koha_and_old_books#Pros_and_cons
Regards ...
On Thu, Nov 12, 2015 at 11:05 PM, Tajoli Zeno <z.tajoli@cineca.it> wrote:
Hi Jesse,
Il 12/11/2015 15:16, Jesse Lambertson ha scritto:
I think this is because 880 is not in the default indexing list. It seems to me that associated linked 880 fields for title (245), author( 100 and 700) as well as series (490 and 830) should return results searched in Arabic as long as we add 880 to the list.
Am I incorrect in that assumption?
No, sorry. You assumption is correct. You need to change the default indexing setup to index your 880 that are probably like: 880 10$6245-01/(3/r$a[Arabic chars]
I suggest you to open a bug here: http://bugs.koha-community.org/bugzilla3/ with description of the problem and 10-20 records from your catalogue to do test.
Do you want to try to fix your self ? Well, you need to understand how indexing works with Zebra.
Start on chapter 13 of the manual: http://translate.koha-community.org/manual/3.20/en/searching.html
Read the basic on Zebra documentation: http://www.indexdata.com/zebra/doc/record-model-domxml.html (Tip: configuration language of the files is XSLT, so you need to learn a basic of it).
The 3 basic files are: etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
The real working file is biblio-zebra-indexdefs.xsl In its head you can read: <!-- This file has been automatically generated from a Koha index definition file with the stylesheet koha-indexdefs-to-zebra.xsl. Do not manually edit this file,as it may be overwritten. To regenerate, edit the appropriate Koha index definition file (probably something like {biblio,authority}-koha-indexdefs.xml) and run: `xsltproc koha-indexdefs-to-zebra.xsl {biblio,authority}-koha-indexdefs.xml > {biblio,authority}-zebra-indexdefs.xsl` (substituting the appropriate file names). -->
You probaly need to backport this fix from master to 3.20: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217
and insert many condition in biblio-koha-indexdefs.xml about 880 with specific values in $6
Not an easy work.
Bye Zeno Tajoli
-- Zeno Tajoli /Dipartimento Sviluppi Innovativi/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
-- *Karam Qubsi <https://www.linkedin.com/in/kqubsi>*
*Shah Alam , Malaysia . *
-- Jesse A Lambertson Librarian Sultan Qaboos Cultural Center <http://www.sqcc.org/>
Ph: (202)-677-3967 Ext. 104 jlambertson@sqcc.org عالم الانجازات ينحاز دوماً مع المتفائلين
Zeno, Grazie for the information you sent. I do, however, have one more question regarding this: http://wiki.koha-community.org/wiki/How_to_add_new_zebra_index It was posted by someone who was also in need of adding non-default index fields to the list. Does anyone know anything about the steps as suggested there or do we really need to rework the whole thing because of the Arabic and the linkages to their respective fields? Shukran, grazie, Danke and merci... Jesse On Thu, Nov 12, 2015 at 10:05 AM, Tajoli Zeno <z.tajoli@cineca.it> wrote:
Hi Jesse,
Il 12/11/2015 15:16, Jesse Lambertson ha scritto:
I think this is because 880 is not in the default indexing list. It seems to me that associated linked 880 fields for title (245), author( 100 and 700) as well as series (490 and 830) should return results searched in Arabic as long as we add 880 to the list.
Am I incorrect in that assumption?
No, sorry. You assumption is correct. You need to change the default indexing setup to index your 880 that are probably like: 880 10$6245-01/(3/r$a[Arabic chars]
I suggest you to open a bug here: http://bugs.koha-community.org/bugzilla3/ with description of the problem and 10-20 records from your catalogue to do test.
Do you want to try to fix your self ? Well, you need to understand how indexing works with Zebra.
Start on chapter 13 of the manual: http://translate.koha-community.org/manual/3.20/en/searching.html
Read the basic on Zebra documentation: http://www.indexdata.com/zebra/doc/record-model-domxml.html (Tip: configuration language of the files is XSLT, so you need to learn a basic of it).
The 3 basic files are: etc/zebradb/marc_defs/marc21/biblios/biblio-koha-indexdefs.xml etc/zebradb/xsl/koha-indexdefs-to-zebra.xsl etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl
The real working file is biblio-zebra-indexdefs.xsl In its head you can read: <!-- This file has been automatically generated from a Koha index definition file with the stylesheet koha-indexdefs-to-zebra.xsl. Do not manually edit this file,as it may be overwritten. To regenerate, edit the appropriate Koha index definition file (probably something like {biblio,authority}-koha-indexdefs.xml) and run: `xsltproc koha-indexdefs-to-zebra.xsl {biblio,authority}-koha-indexdefs.xml > {biblio,authority}-zebra-indexdefs.xsl` (substituting the appropriate file names). -->
You probaly need to backport this fix from master to 3.20: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217
and insert many condition in biblio-koha-indexdefs.xml about 880 with specific values in $6
Not an easy work.
Bye Zeno Tajoli
-- Zeno Tajoli /Dipartimento Sviluppi Innovativi/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
-- Jesse A Lambertson Librarian Sultan Qaboos Cultural Center <http://www.sqcc.org/> Ph: (202)-677-3967 Ext. 104 jlambertson@sqcc.org عالم الانجازات ينحاز دوماً مع المتفائلين
Hi Il 12/11/2015 16:20, Jesse Lambertson ha scritto:
I do, however, have one more question regarding this: http://wiki.koha-community.org/wiki/How_to_add_new_zebra_index
I do a correction on this wiki page. Now it si OK.
It was posted by someone who was also in need of adding non-default index fields to the list.
The wiki page is correct. But your case is much more complex. Also the wiki page about olb books it is OK, but you can use it only as basic how-to to undestand how config indexing work.
Does anyone know anything about the steps as suggested there or do we really need to rework the whole thing because of the Arabic and the linkages to their respective fields?
You don't to delete part of default indexing config, you need to add 880 indexing. But it is difficult, because you need to backport bug 14217, http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=14217 If you migrate to 3.22 [it arrive at the end of november 2015] you still need to change default indexing to use 880 more deeply. I suggest you to open a bug on bugzilla. Add 10-20 marc record from your catalogue with 880 to do tests. Bye Zeno Tajoli -- Zeno Tajoli /Dipartimento Sviluppi Innovativi/ - Automazione Biblioteche Email: z.tajoli@cineca.it Fax: 051/6132198 *CINECA* Consorzio Interuniversitario - Sede operativa di Segrate (MI)
participants (4)
-
David Cook -
Jesse Lambertson -
Karam Qubsi -
Tajoli Zeno