New subject: Solr and enhanced searching (WAS: Re: Koha Digest, Vol 80, Issue 40)

26 Jun 2012

      Jared,

Thank you for the explanation.

When I looked at the link you provided:
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233
I was surprised to see the heading:
"*Bug 8233* <http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233>-New
search engine layer - introduce solr without breaking anything else"

Isn't the "without breaking anything else" something that is assumed in the
software development process?

There have been times I've wanted to add "without breaking anything else"
or something similar to every enhancement and bug fix request going to our
vendor, but I hoped it was something implied in any enhancement request.
 The enhancement that gave our version of Koha phrase searching does indeed
break another very important feature when it's activated. Perhaps this
level of explicitness is the model we need to follow.

This makes me think of an old comedy routine where a man asks a faith
healer to cure his twisted hand.  The preacher says, "Lord, make this one
hand like the other" and the guy gets two twisted hands.  ( *
http://tinyurl.com/79s7jea )*

-- 
Stacy Pober
Information Alchemist
Riverdale, NY 10471
stacy.pober@manhattan.edu

On Mon, Jun 25, 2012 at 2:34 PM, <koha-request@lists.katipo.co.nz> wrote:
...
From: Jared Camins-Esakov <jcamins@cpbibliography.com>
To: Stacy Pober <stacy.pober@manhattan.edu>
Cc: koha@lists.katipo.co.nz
Subject: Re: [Koha] Koha Digest, Vol 80, Issue 38
Message-ID:
       <CALVDfQxf8yeR6q71tFpNRNsRYU_tajukqZ8jq3yXAN6RVKJhqA@mail.gmail.com
...
Content-Type: text/plain; charset=UTF-8
Stacy,
How is it that no one has added this kind of searching to the community
...
version of Koha?  It's a major omission.
I think it's a matter of funding and interest. A lot of developers[1] would
love to see better search lexing in Koha[2], but the cost would be rather
high, as it requires a more-or-less complete rewrite of the searching code
in order to work, and my impression is that the searching works well enough
for most people that it just isn't a priority for most organizations
funding Koha development. That said, I am in complete agreement that not
having a flexible search lexer in Koha is a serious omission[3].
People are so used to using quote marks and those operators in search
...
strings that they often try to use them in Koha even though they don't
work
as expected. One of the features in an "easy to use" OPAC is that the
searching conforms to expected norms that the public knows from other
common search interfaces.
I'm astonished that no one in the world has added properly working code
...
that would make the most widely understood phrase searching method
(quotation marks) available in community Koha.
Quotation marks for identifying longer strings in searches do work,
actually, though not quite the way we might like. What we (as librarians)
call a phrase search is (sorry for the tautology) a search for a phrase
which comprises one or more words. What Zebra, the search engine the Koha
uses, considers a phrase search is a search for a phrase consisting of one
or more words that comprise the entire subfield in which they appear. The
fallback is something closer to what we might consider a phrase search, but
the results can sometimes be slightly skewed (fortunately for us, my
experience has been that for the most part, the skewing is slight enough
that patrons are able to find what they want). This causes all sorts of
mind-bending confusion, particularly in the area of subject tracings[4].
...
As for the + and - operators, if anyone tries adding that, they have to
code it as a two-character string (space+ or space-) to prevent confusion
when users are looking for hyphenated words.
Right now, depending on your settings, punctuation may either be ignored
(in which case "+mice -computer" will return results about computer mice
only) or considered to be letters (in which case the search will not return
any results at all). This is an example of what I was talking about above:
in order for + and - operators to really work, a lexer that can understand
them would need to be available to Koha.
Is anyone in the Koha-verse working on adding a search mode that
...
uses quotation marks and + - operators?  I mean, aside from the buggy
version we have in Liblime's LAK.
There is work on incorporating solr into Koha, which is very promising, and
will fix some of these problems[5]. Solr-based searching should be
available in 3.10, I think. A patch[6] is currently undergoing QA to
integrate BibLibre's work on Solr into Koha, and I don't imagine it will
take too long for it to be pushed. :)
Hope that helps.
Regards,
Jared
[1] Well, I would love to see better search lexing in Koha, and a few other
people have agreed with me that it would be a great idea (usually while
backing away slowly and hoping I don't explain how I think it should work).
[2] A lexer is the tool that takes the query a user enters and turns it
into a data structure that tells the search engine "this is an unambiguous
representation of what the user asked for, to the extent that the user
provided an unambiguous query." Right now Koha uses the query lexer
provided by Zebra. OCLC's query lexer understand query strings like
"pd:moz,wol,a" and turns it into a query that could be expressed in English
as "Search the derived personal name index for records where the first
component [i.e. last name] starts with 'moz' the second component [i.e.
first name] starts with 'wol' and the third component [i.e. middle name]
starts with 'a.'"
[3] I would love to correct the omission myself, but it's far too large a
project to do for "fun" (also, the code in question is rather painful to
look at, so it wouldn't even be that much fun!).
[4] Galen Charlton and I have both been doing work relating to authorities
and indexing recently that will help to resolve many of the issues with
heading linking and searching. (see, for example,
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284 and
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7818 though there
are a number of other related bugs as well; most of C & P's
authority-related developments are grouped under
http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8212)
[5] As I understand it, though, we will be trading Zebra's lexer for solr's
slightly more capable lexer, which is still not the ideal.
[6] http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233
--
Jared Camins-Esakov
Bibliographer, C & P Bibliography Services, LLC
(phone) +1 (917) 727-3445
(e-mail) jcamins@cpbibliography.com
(web) http://www.cpbibliography.com/

Solr and enhanced searching (WAS: Re: Koha Digest, Vol 80, Issue 40)

Stacy Pober

Chris Cormack

tags

participants (2)