[Koha] Solr and enhanced searching (WAS: Re: Koha Digest, Vol 80, Issue 40)

Tue Jun 26 19:12:42 NZST 2012

Jared,

Thank you for the explanation.

When I looked at the link you provided:
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233
I was surprised to see the heading:
"*Bug 8233* <http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233>-New
search engine layer - introduce solr without breaking anything else"

Isn't the "without breaking anything else" something that is assumed in the
software development process?

There have been times I've wanted to add "without breaking anything else"
or something similar to every enhancement and bug fix request going to our
vendor, but I hoped it was something implied in any enhancement request.
 The enhancement that gave our version of Koha phrase searching does indeed
break another very important feature when it's activated. Perhaps this
level of explicitness is the model we need to follow.

This makes me think of an old comedy routine where a man asks a faith
healer to cure his twisted hand.  The preacher says, "Lord, make this one
hand like the other" and the guy gets two twisted hands.  ( *
http://tinyurl.com/79s7jea )*

-- 
Stacy Pober
Information Alchemist
Riverdale, NY 10471
stacy.pober at manhattan.edu

On Mon, Jun 25, 2012 at 2:34 PM, <koha-request at lists.katipo.co.nz> wrote:

> From: Jared Camins-Esakov <jcamins at cpbibliography.com>
> To: Stacy Pober <stacy.pober at manhattan.edu>
> Cc: koha at lists.katipo.co.nz
> Subject: Re: [Koha] Koha Digest, Vol 80, Issue 38
> Message-ID:
>        <CALVDfQxf8yeR6q71tFpNRNsRYU_tajukqZ8jq3yXAN6RVKJhqA at mail.gmail.com
> >
> Content-Type: text/plain; charset=UTF-8
>
> Stacy,
>
> How is it that no one has added this kind of searching to the community
> > version of Koha?  It's a major omission.
> >
>
> I think it's a matter of funding and interest. A lot of developers[1] would
> love to see better search lexing in Koha[2], but the cost would be rather
> high, as it requires a more-or-less complete rewrite of the searching code
> in order to work, and my impression is that the searching works well enough
> for most people that it just isn't a priority for most organizations
> funding Koha development. That said, I am in complete agreement that not
> having a flexible search lexer in Koha is a serious omission[3].
>
> People are so used to using quote marks and those operators in search
> > strings that they often try to use them in Koha even though they don't
> work
> > as expected. One of the features in an "easy to use" OPAC is that the
> > searching conforms to expected norms that the public knows from other
> > common search interfaces.
> >
>
> I'm astonished that no one in the world has added properly working code
> > that would make the most widely understood phrase searching method
> > (quotation marks) available in community Koha.
> >
>
> Quotation marks for identifying longer strings in searches do work,
> actually, though not quite the way we might like. What we (as librarians)
> call a phrase search is (sorry for the tautology) a search for a phrase
> which comprises one or more words. What Zebra, the search engine the Koha
> uses, considers a phrase search is a search for a phrase consisting of one
> or more words that comprise the entire subfield in which they appear. The
> fallback is something closer to what we might consider a phrase search, but
> the results can sometimes be slightly skewed (fortunately for us, my
> experience has been that for the most part, the skewing is slight enough
> that patrons are able to find what they want). This causes all sorts of
> mind-bending confusion, particularly in the area of subject tracings[4].
>
>
> >  As for the + and - operators, if anyone tries adding that, they have to
> > code it as a two-character string (space+ or space-) to prevent confusion
> > when users are looking for hyphenated words.
> >
>
> Right now, depending on your settings, punctuation may either be ignored
> (in which case "+mice -computer" will return results about computer mice
> only) or considered to be letters (in which case the search will not return
> any results at all). This is an example of what I was talking about above:
> in order for + and - operators to really work, a lexer that can understand
> them would need to be available to Koha.
>
> Is anyone in the Koha-verse working on adding a search mode that
> > uses quotation marks and + - operators?  I mean, aside from the buggy
> > version we have in Liblime's LAK.
> >
>
> There is work on incorporating solr into Koha, which is very promising, and
> will fix some of these problems[5]. Solr-based searching should be
> available in 3.10, I think. A patch[6] is currently undergoing QA to
> integrate BibLibre's work on Solr into Koha, and I don't imagine it will
> take too long for it to be pushed. :)
>
> Hope that helps.
>
> Regards,
> Jared
>
> [1] Well, I would love to see better search lexing in Koha, and a few other
> people have agreed with me that it would be a great idea (usually while
> backing away slowly and hoping I don't explain how I think it should work).
> [2] A lexer is the tool that takes the query a user enters and turns it
> into a data structure that tells the search engine "this is an unambiguous
> representation of what the user asked for, to the extent that the user
> provided an unambiguous query." Right now Koha uses the query lexer
> provided by Zebra. OCLC's query lexer understand query strings like
> "pd:moz,wol,a" and turns it into a query that could be expressed in English
> as "Search the derived personal name index for records where the first
> component [i.e. last name] starts with 'moz' the second component [i.e.
> first name] starts with 'wol' and the third component [i.e. middle name]
> starts with 'a.'"
> [3] I would love to correct the omission myself, but it's far too large a
> project to do for "fun" (also, the code in question is rather painful to
> look at, so it wouldn't even be that much fun!).
> [4] Galen Charlton and I have both been doing work relating to authorities
> and indexing recently that will help to resolve many of the issues with
> heading linking and searching. (see, for example,
> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284 and
> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7818 though there
> are a number of other related bugs as well; most of C & P's
> authority-related developments are grouped under
> http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8212)
> [5] As I understand it, though, we will be trading Zebra's lexer for solr's
> slightly more capable lexer, which is still not the ideal.
> [6] http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233
>
> --
> Jared Camins-Esakov
> Bibliographer, C & P Bibliography Services, LLC
> (phone) +1 (917) 727-3445
> (e-mail) jcamins at cpbibliography.com
> (web) http://www.cpbibliography.com/
>