[Koha] Title search works, but Library catalog search fails in OPAC
Jason Boyer
JBoyer at equinoxOLI.org
Sat Jun 26 00:13:48 NZST 2021
Hi Tasha, in yaml using a - like that makes the entry part of a list / array of values.
This is easier to see with a yaml -> json example
analyzer_phrase:
tokenizer: keyword
filter:
- icu_folding
char_filter:
- punctuation
Becomes something like this in json:
{ “analyzer_phrases": { “tokenizer": “keyword” }, “filter”: [“icu_folding”], “char_filter”: [“punctuation”] }
So adding additional ‘-‘-prefixed entries below filter or char_filter would add additional entries to those arrays. Just having the bare word punctuation below char_filter would be a syntax error.
This means that there should be a definition for the punctuation char_filter *somewhere*, but possibly not in that file.
Jason
--
Jason Boyer
Senior System Administrator
Equinox Open Library Initiative
JBoyer at equinoxOLI.org
+1 (877) Open-ILS (673-6457)
https://equinoxOLI.org/
> On Jun 24, 2021, at 5:55 PM, Bales (US), Tasha R <tasha.r.bales at boeing.com> wrote:
>
> Follow-up to my problem searching the OPAC for phrases containing punctuation (i.e., Electroactive polymer (EAP) actuators). This Bywater Solutions<https://bywatersolutions.com/education/elastic-searching> article suggests that the problem is a feature of Elasticsearch.
>
>
>
> FYI, we are using Elasticsearch 6.1.1 on its own dedicated server, and I don’t believe we’ve installed the ICU Analysis plug-in (looks like it’s required for Zebra, but I can’t tell if it’s required for Elasticsearch), which could be a factor. I couldn’t replicate all aspects of my experience in a sandbox, although searches for phrases containing punctuation still failed in an OPAC “Library Catalog” sandbox search. I concluded that I needed to review our configuration.
>
>
>
> I’ve been reviewing the Koha Wiki and elastic.co documentation, and comparing to our index_config.yaml.
>
> By chance does anyone know how to interpret the syntax below? The documentation describes the parameters below, but I don’t see any usage of “-“ before options. Does the option “- punctuation“ mean “yes, remove punctuation”, or “no, don’t remove punctuation”, or does the phrase refer to some additional configuration file, or perhaps it’s commented out?
>
>
>
> analyzer_phrase:
>
> tokenizer: keyword
>
> filter:
>
> - icu_folding
>
> char_filter:
>
> - punctuation
>
>
>
> Thanks for your time and consideration,
>
>
>
>
>
> Tasha Bales
>
> Business Support Team | Information Services
>
> Enterprise Services | Enterprise Operations, Finance and Sustainability
>
>
>
>
>
> -----Original Message-----
>
> From: Bales (US), Tasha R
>
> Sent: Tuesday, June 22, 2021 7:54 AM
>
> To: 'Jonathan Druart' <jonathan.druart at bugs.koha-community.org>
>
> Cc: Discussion Group Koha <koha at lists.katipo.co.nz>
>
> Subject: RE: [EXTERNAL] Re: [Koha] Title search works, but Library catalog search fails in OPAC
>
>
>
> Jonathan, thank you!
>
>
>
> It does work without the parentheses.
>
>
>
> I would suspect an encoding problem, but for that the problem only manifests in the OPAC, and not the intranet.
>
>
>
> I came across this issue while testing after migrating from MariaDB to Percona MySQL. Your reply prompted me to check the encoding of the new database, and it's unfortunately Latin-1. Since these are parentheses and not diacritics, I’m not sure what my expectations should be, but changing to UTF-8 is a place to start. Httpd.conf does have UTF-8 set as the default.
>
>
>
> FWIW, my source records were encoded in MARC-8. I used MarcEdit to convert them to UTF-8, and it appears that Koha automatically converts anyway on import. When I loaded these records into Koha, I used bulkmarcimport.pl on the command line.
>
>
>
> I'll ask that the default character set of the database be changed, and see if that helps. Thanks again. I'm embarrassed that I didn't think to omit the parentheses, or rather was belligerently insisting to myself that they should not have been a problem,
>
>
>
>
>
> Tasha Bales
>
> Business Support Team | Information Services Enterprise Services | Enterprise Operations, Finance and Sustainability
>
> (480) 509-5415
>
> https://is.web.boeing.com
>
>
>
>
>
> -----Original Message-----
>
> From: Jonathan Druart [mailto:jonathan.druart at bugs.koha-community.org]
>
> Sent: Tuesday, June 22, 2021 12:01 AM
>
> To: Bales (US), Tasha R <tasha.r.bales at boeing.com>
>
> Cc: Discussion Group Koha <koha at lists.katipo.co.nz>
>
> Subject: [EXTERNAL] Re: [Koha] Title search works, but Library catalog search fails in OPAC
>
> Importance: High
>
>
>
> EXT email: be mindful of links/attachments.
>
>
>
>
>
>
>
> Hello Tasha,
>
>
>
> I've created 2 records with
>
> 245$a Electroactive polymer (EAP) actuators as artificial muscles and the following query returns the 2 results.
>
> /opac-search.pl?idx=&q=Electroactive%20polymer%20%28EAP%29%20actuators&weight_search=1
>
>
>
> Tried on master and 20.11.06.
>
>
>
> Maybe a silly idea: does it work without the parenthesis?
>
>
>
> Could you try and recreate it on a sandbox
>
> (https://wiki.koha-community.org/wiki/Sandboxes) and provide us a step by step plan to reproduce the problem?
>
>
>
> Regards,
>
> Jonathan
>
>
>
> Le mar. 22 juin 2021 à 00:46, Bales (US), Tasha R <tasha.r.bales at boeing.com> a écrit :
>
>>
>
>> Good afternoon,
>
>>
>
>> I’m having trouble with Title vs. Library catalog keyword searching with several example titles. Searching the same phrase with either method yields different results. This problem occurs only in the OPAC. I hope to confirm whether the behavior I’m seeing is intended (i.e., the problem is me) or not. Thanks in advance.
>
>>
>
>> For example, given the ebook title, Electroactive polymer (EAP)
>
>> actuators as artificial muscles, a Title keyword search in the OPAC is successful, but a plain, Library catalog (i.e., no index specified), keyword search fails.
>
>>
>
>> For reference, the title is recorded in the MARC record as:
>
>> 245 00 - TITLE STATEMENT
>
>> a Title Electroactive polymer (EAP) actuators as artificial muscles :
>
>>
>
>> Below I’ve copied in my search history as well as the tail of the search URL that shows the search parameters.
>
>>
>
>>
>
>> · Library catalog keyword search with 0 results
>
>>
>
>> o 2021-06-21 02:34 PM Electroactive polymer (EAP) actuators, suppress:false 0
>
>>
>
>> o …opac-search.pl?idx=&q=Electroactive%20polymer%20%28EAP%29%20actuators&weight_search=1
>
>>
>
>>
>
>> · Title keyword search with 2 results
>
>>
>
>> o 2021-06-21 02:34 PM Electroactive polymer (EAP) actuators, suppress:false 2
>
>>
>
>> o …opac-search.pl?idx=ti&q=Electroactive+polymer+%28EAP%29+actuators&weight_search=1
>
>>
>
>> As a test, I decided to enclose my Library catalog search terms in quotes, which yielded the desired results. However, I did not at all anticipate that quotes would be required to get hits:
>
>>
>
>>
>
>> · Library catalog quoted keyword search with 2 results
>
>>
>
>> o 2021-06-21 02:46 PM "Electroactive polymer (EAP) actuators", suppress:false 2
>
>>
>
>> o … opac-search.pl?idx=&q=%22Electroactive+polymer+%28EAP%29+actuators%22&weight_search=1
>
>>
>
>> On comparing the above URL query strings, it appears that the unquoted terms in the Library catalog keyword search aren’t “anded” together with a “+” the way other searches are, but I’m not sure what the implications are, if any. Also, the Koha manual indicates the following, which suggests to me that I ought to get hits on the unquoted string:
>
>>
>
>> When you have more than one word in the search box, Koha will still do a keyword search, but a bit differently. Each word will be searched on its own, then the Boolean connector ‘and’ will narrow your search to those items with all words contained in matching records.
>
>>
>
>> I understand and can predict pretty well the way our old ILS (Millennium, if context helps) will perform a keyword search, but I’m a little confused here. My expectation for this particular case is that all of the above methods would yield results. If there are any pointers to be had, I thank you if might point me to them so that I may be better poised to help users.
>
>>
>
>> I’m using Elasticsearch with Koha 20.11.06. I reindexed both authorities and biblios today, but that didn’t impact my experience. The records are not newly added.
>
>>
>
>> Thanks!
>
>>
>
>>
>
>> Tasha Bales
>
>> Business Support Team | Information Services Enterprise Services |
>
>> Enterprise Operations, Finance and Sustainability
>
>>
>
>> _______________________________________________
>
>>
>
>> Koha mailing list http://koha-community.org Koha at lists.katipo.co.nz
>
>> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>
>
> _______________________________________________
>
> Koha mailing list http://koha-community.org
> Koha at lists.katipo.co.nz
> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
More information about the Koha
mailing list