Multilingual Searching and ICU Chains
Hello, all! I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script. What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html. Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so? Many thanks for your help in this matter. -- -- Charles. Charles Kelley, MLS PSC 704 Box 1029 APO AP 96338 Charles Kelley 1-5-2 Tsukimino #210 Yamato-shi, Kanagawa-ken, 〒242-0002 JAPAN 1-301-741-7122 [US cell] 011-81-80-4714-5490 [JPN cell] mnogojazyk@aol.com [h] cmkelleymls@gmail.com [p] linkedin.com/in/cmkelleymls <http://www.linkedin.com/in/cmkelleymls> Meeting Your Information Needs. Virtually.
Hi, I'd use Elasticsearch, at least with all the improvements in the upcoming Koha 19.11 version. It supports e.g. ICU folding out of the box, and the analysis chain can be relatively easily modified too. See the (quite condensed) information at https://wiki.koha-community.org/wiki/Elasticsearch. Best, Ere Charles Kelley kirjoitti 20.11.2019 klo 21.39:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
-- Ere Maijala Kansalliskirjasto / The National Library of Finland
Hi Charles, if ICU with Zebra is set up correctly, you should be able to search for the records using the original script (as cataloged) without any further setup. If the original script is in 880 only the keyword search will include them by default. But this is not related to ICU - the problem is that indexing is not set up to include the 880 xxx in the corresponding indexes. If you want to search transliterated forms or need some other specific things, adjustments to the chains might be needed. Hope this helps, Katrin On 20.11.19 20:39, Charles Kelley wrote:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
Hi, I might start to sound like a salesman, but the Elasticsearch indexing code in Koha handles and indexes alternate script fields (880) automatically in the correct search fields. :) Regards, Ere Katrin Fischer kirjoitti 22.11.2019 klo 11.44:
Hi Charles,
if ICU with Zebra is set up correctly, you should be able to search for the records using the original script (as cataloged) without any further setup. If the original script is in 880 only the keyword search will include them by default. But this is not related to ICU - the problem is that indexing is not set up to include the 880 xxx in the corresponding indexes.
If you want to search transliterated forms or need some other specific things, adjustments to the chains might be needed.
Hope this helps,
Katrin
On 20.11.19 20:39, Charles Kelley wrote:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
-- Ere Maijala Kansalliskirjasto / The National Library of Finland
Hi Ere, love to hear this! :) Do we have documentation about this? Katrin On 22.11.19 12:12, Ere Maijala wrote:
Hi,
I might start to sound like a salesman, but the Elasticsearch indexing code in Koha handles and indexes alternate script fields (880) automatically in the correct search fields. :)
Regards, Ere
Katrin Fischer kirjoitti 22.11.2019 klo 11.44:
Hi Charles,
if ICU with Zebra is set up correctly, you should be able to search for the records using the original script (as cataloged) without any further setup. If the original script is in 880 only the keyword search will include them by default. But this is not related to ICU - the problem is that indexing is not set up to include the 880 xxx in the corresponding indexes.
If you want to search transliterated forms or need some other specific things, adjustments to the chains might be needed.
Hope this helps,
Katrin
On 20.11.19 20:39, Charles Kelley wrote:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Hi Katrin, Well, sort of. In the wiki (https://wiki.koha-community.org/wiki/Elasticsearch) there's a whole sentence on this: "Note that all data field mappings will automatically handle alternate script fields (880) for MARC 21 records." Implementation details in bug 20244. --Ere Katrin Fischer kirjoitti 22.11.2019 klo 14.23:
Hi Ere,
love to hear this! :) Do we have documentation about this?
Katrin
On 22.11.19 12:12, Ere Maijala wrote:
Hi,
I might start to sound like a salesman, but the Elasticsearch indexing code in Koha handles and indexes alternate script fields (880) automatically in the correct search fields. :)
Regards, Ere
Katrin Fischer kirjoitti 22.11.2019 klo 11.44:
Hi Charles,
if ICU with Zebra is set up correctly, you should be able to search for the records using the original script (as cataloged) without any further setup. If the original script is in 880 only the keyword search will include them by default. But this is not related to ICU - the problem is that indexing is not set up to include the 880 xxx in the corresponding indexes.
If you want to search transliterated forms or need some other specific things, adjustments to the chains might be needed.
Hope this helps,
Katrin
On 20.11.19 20:39, Charles Kelley wrote:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
-- Ere Maijala Kansalliskirjasto / The National Library of Finland
*Honorable Friends * *ABCD Software is a power full Library Automation Software* *FOR PRESENTATION Please Google Search: Rasheed ABCD* *https://www.slideshare.net/Rasheed1976/presentation-by-rasheedahmedmarch2011 <https://www.slideshare.net/Rasheed1976/presentation-by-rasheedahmedmarch2011>* *ABCD* stands for "Automatización de Bibliotecas y Centros de Documentación" (Spanish), which means: Library and Documentation Centers Automation. Its development was promoted and coordinated by BIREME, with the support of VLIR. *ABCD* is web-based integrated library management software comprising the main basic library functions. This kind of library application is a long held aspiration for the ISIS community, since the first MS-DOS version came out more than 20 years ago. Several library automation systems were developed during this period and are still in operation worldwide. BIRME EMP previous system was limited to the circulation services. The main characteristics of ABCD are the coverage of the main library functions, its web centrality and its development and maintenance under the methodology of Free and Open Source Software. *Main functions* - Definition of any number of new databases (similar to Winisis), which includes: FDT, PFT, FST, and worksheets directly on the Web, or copying from existing ones either from the Web or from Winisis on a local hard disk, - Cataloguing of books and serials, independently of the format: MARC, LILACS, AGRIS, etc. - End-user searching (OPAC), - Loans circulation, - Acquisitions, - Library services like SDI, barcode printing, quality control, etc. - Compatible with CDS/ISIS database technology for the bibliographic databases, i.e. reading ISIS-databases and making use of ISIS Formatting Language for producing output and indexing of records; - Run on both Windows and Linux platforms; - Use of MARC-21 cataloging formats and other current standards or protocols (Dublin Core, METS, Z39.50...); - Published as Free and Open Source Software (FOSS) with the accompanying tools for the developer community; - Multi-lingual; *Rasheed Ahmed* On Fri, Nov 22, 2019 at 3:24 PM Katrin Fischer <katrin.fischer.83@web.de> wrote:
Hi Ere,
love to hear this! :) Do we have documentation about this?
Katrin
On 22.11.19 12:12, Ere Maijala wrote:
Hi,
I might start to sound like a salesman, but the Elasticsearch indexing code in Koha handles and indexes alternate script fields (880) automatically in the correct search fields. :)
Regards, Ere
Katrin Fischer kirjoitti 22.11.2019 klo 11.44:
Hi Charles,
if ICU with Zebra is set up correctly, you should be able to search for the records using the original script (as cataloged) without any further setup. If the original script is in 880 only the keyword search will include them by default. But this is not related to ICU - the problem is that indexing is not set up to include the 880 xxx in the corresponding indexes.
If you want to search transliterated forms or need some other specific things, adjustments to the chains might be needed.
Hope this helps,
Katrin
On 20.11.19 20:39, Charles Kelley wrote:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
*Honorable Friends * *ABCD Software is a power full Library Automation Software* *FOR PRESENTATION Please Google Search: Rasheed ABCD* *https://www.slideshare.net/Rasheed1976/presentation-by-rasheedahmedmarch2011 <https://www.slideshare.net/Rasheed1976/presentation-by-rasheedahmedmarch2011>* *ABCD* stands for "Automatización de Bibliotecas y Centros de Documentación" (Spanish), which means: Library and Documentation Centers Automation. Its development was promoted and coordinated by BIREME, with the support of VLIR. *ABCD* is web-based integrated library management software comprising the main basic library functions. This kind of library application is a long held aspiration for the ISIS community, since the first MS-DOS version came out more than 20 years ago. Several library automation systems were developed during this period and are still in operation worldwide. BIRME EMP previous system was limited to the circulation services. The main characteristics of ABCD are the coverage of the main library functions, its web centrality and its development and maintenance under the methodology of Free and Open Source Software. *Main functions* - Definition of any number of new databases (similar to Winisis), which includes: FDT, PFT, FST, and worksheets directly on the Web, or copying from existing ones either from the Web or from Winisis on a local hard disk, - Cataloguing of books and serials, independently of the format: MARC, LILACS, AGRIS, etc. - End-user searching (OPAC), - Loans circulation, - Acquisitions, - Library services like SDI, barcode printing, quality control, etc. - Compatible with CDS/ISIS database technology for the bibliographic databases, i.e. reading ISIS-databases and making use of ISIS Formatting Language for producing output and indexing of records; - Run on both Windows and Linux platforms; - Use of MARC-21 cataloging formats and other current standards or protocols (Dublin Core, METS, Z39.50...); - Published as Free and Open Source Software (FOSS) with the accompanying tools for the developer community; - Multi-lingual; *Rasheed Ahmed* On Fri, Nov 22, 2019 at 12:45 PM Katrin Fischer <katrin.fischer.83@web.de> wrote:
Hi Charles,
if ICU with Zebra is set up correctly, you should be able to search for the records using the original script (as cataloged) without any further setup. If the original script is in 880 only the keyword search will include them by default. But this is not related to ICU - the problem is that indexing is not set up to include the 880 xxx in the corresponding indexes.
If you want to search transliterated forms or need some other specific things, adjustments to the chains might be needed.
Hope this helps,
Katrin
On 20.11.19 20:39, Charles Kelley wrote:
Hello, all!
I am developing a catalog of multilingual, multiscript materials: Arabic, Bengali, Burmese, Chinese, English, French, German, Japanese, Korean, Mongolian, Russian, Vietnamese, to name but a few. Since both Koha and MarcEdit are UTF8, I can enter the cataloging records and edit them in their appropriate script.
What I cannot do is to search them. I have seen on the Koha Community wiki is that ICU chains have to be configured. Here's the link: https://wiki.koha-community.org/wiki/ICU_Chains_Library. I also found an exchange about this: https://lists.katipo.co.nz/public/koha/2012-January/031714.html.
Is one to infer that the ICU chain has to be configured for each writing system? Are there other multilingual libraries that have implemented multilingual, multiscript searching in Koha and how did they do so?
Many thanks for your help in this matter.
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
participants (4)
-
Charles Kelley -
Ere Maijala -
Katrin Fischer -
Rasheed A.