Solr and enhanced searching (WAS: Re: Koha Digest, Vol 80, Issue 40)
Jared, Thank you for the explanation. When I looked at the link you provided: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233 I was surprised to see the heading: "*Bug 8233* <http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233>-New search engine layer - introduce solr without breaking anything else" Isn't the "without breaking anything else" something that is assumed in the software development process? There have been times I've wanted to add "without breaking anything else" or something similar to every enhancement and bug fix request going to our vendor, but I hoped it was something implied in any enhancement request. The enhancement that gave our version of Koha phrase searching does indeed break another very important feature when it's activated. Perhaps this level of explicitness is the model we need to follow. This makes me think of an old comedy routine where a man asks a faith healer to cure his twisted hand. The preacher says, "Lord, make this one hand like the other" and the guy gets two twisted hands. ( * http://tinyurl.com/79s7jea )* -- Stacy Pober Information Alchemist Riverdale, NY 10471 stacy.pober@manhattan.edu On Mon, Jun 25, 2012 at 2:34 PM, <koha-request@lists.katipo.co.nz> wrote:
From: Jared Camins-Esakov <jcamins@cpbibliography.com> To: Stacy Pober <stacy.pober@manhattan.edu> Cc: koha@lists.katipo.co.nz Subject: Re: [Koha] Koha Digest, Vol 80, Issue 38 Message-ID: <CALVDfQxf8yeR6q71tFpNRNsRYU_tajukqZ8jq3yXAN6RVKJhqA@mail.gmail.com
Content-Type: text/plain; charset=UTF-8
Stacy,
How is it that no one has added this kind of searching to the community
version of Koha? It's a major omission.
I think it's a matter of funding and interest. A lot of developers[1] would love to see better search lexing in Koha[2], but the cost would be rather high, as it requires a more-or-less complete rewrite of the searching code in order to work, and my impression is that the searching works well enough for most people that it just isn't a priority for most organizations funding Koha development. That said, I am in complete agreement that not having a flexible search lexer in Koha is a serious omission[3].
People are so used to using quote marks and those operators in search
strings that they often try to use them in Koha even though they don't work as expected. One of the features in an "easy to use" OPAC is that the searching conforms to expected norms that the public knows from other common search interfaces.
I'm astonished that no one in the world has added properly working code
that would make the most widely understood phrase searching method (quotation marks) available in community Koha.
Quotation marks for identifying longer strings in searches do work, actually, though not quite the way we might like. What we (as librarians) call a phrase search is (sorry for the tautology) a search for a phrase which comprises one or more words. What Zebra, the search engine the Koha uses, considers a phrase search is a search for a phrase consisting of one or more words that comprise the entire subfield in which they appear. The fallback is something closer to what we might consider a phrase search, but the results can sometimes be slightly skewed (fortunately for us, my experience has been that for the most part, the skewing is slight enough that patrons are able to find what they want). This causes all sorts of mind-bending confusion, particularly in the area of subject tracings[4].
As for the + and - operators, if anyone tries adding that, they have to code it as a two-character string (space+ or space-) to prevent confusion when users are looking for hyphenated words.
Right now, depending on your settings, punctuation may either be ignored (in which case "+mice -computer" will return results about computer mice only) or considered to be letters (in which case the search will not return any results at all). This is an example of what I was talking about above: in order for + and - operators to really work, a lexer that can understand them would need to be available to Koha.
Is anyone in the Koha-verse working on adding a search mode that
uses quotation marks and + - operators? I mean, aside from the buggy version we have in Liblime's LAK.
There is work on incorporating solr into Koha, which is very promising, and will fix some of these problems[5]. Solr-based searching should be available in 3.10, I think. A patch[6] is currently undergoing QA to integrate BibLibre's work on Solr into Koha, and I don't imagine it will take too long for it to be pushed. :)
Hope that helps.
Regards, Jared
[1] Well, I would love to see better search lexing in Koha, and a few other people have agreed with me that it would be a great idea (usually while backing away slowly and hoping I don't explain how I think it should work). [2] A lexer is the tool that takes the query a user enters and turns it into a data structure that tells the search engine "this is an unambiguous representation of what the user asked for, to the extent that the user provided an unambiguous query." Right now Koha uses the query lexer provided by Zebra. OCLC's query lexer understand query strings like "pd:moz,wol,a" and turns it into a query that could be expressed in English as "Search the derived personal name index for records where the first component [i.e. last name] starts with 'moz' the second component [i.e. first name] starts with 'wol' and the third component [i.e. middle name] starts with 'a.'" [3] I would love to correct the omission myself, but it's far too large a project to do for "fun" (also, the code in question is rather painful to look at, so it wouldn't even be that much fun!). [4] Galen Charlton and I have both been doing work relating to authorities and indexing recently that will help to resolve many of the issues with heading linking and searching. (see, for example, http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7284 and http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=7818 though there are a number of other related bugs as well; most of C & P's authority-related developments are grouped under http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8212) [5] As I understand it, though, we will be trading Zebra's lexer for solr's slightly more capable lexer, which is still not the ideal. [6] http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233
-- Jared Camins-Esakov Bibliographer, C & P Bibliography Services, LLC (phone) +1 (917) 727-3445 (e-mail) jcamins@cpbibliography.com (web) http://www.cpbibliography.com/
On 26 June 2012 19:12, Stacy Pober <stacy.pober@manhattan.edu> wrote:
Jared,
Thank you for the explanation.
When I looked at the link you provided: http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233 I was surprised to see the heading: "*Bug 8233* <http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=8233>-New search engine layer - introduce solr without breaking anything else"
Isn't the "without breaking anything else" something that is assumed in the software development process?
Yes it is, that's what we have unit testing and regression testing for. You can actually follow the continuous integration testing here http://jenkins.koha-community.org/ . Here is the master branch http://jenkins.koha-community.org/job/Koha_master/ (which what will be the 3.10.0 release). The other active branches are the 'oldstable' 3.6.x branch of which 3.6.6 has just been released. And the 3.8.x branch, of which 3.8.2 has just been released. This 'bot' jenkins builds and installs a Koha, and runs over 10,000 tests every time anything changes, trying to minimise what is unintentionally broken by any new feature. What this bug is actually referring to is the fact that Koha has been able to work with Solr for nearly 2 years now, but that implementation was a replacement of Zebra. IE removing zebra and adding Solr. This new implementation (which works very well so far) instead allows the Library to choose between Zebra or Solr. IE it is Solr added without removing Zebra support. What is even better is that it allows us to add any number of different indexing engines. A lot of our bugs have tongue and cheek, or ironic names. Some are just silly fun, this is one of my favourites http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=5158. Sometimes some of us (well mainly me) go a little crazy http://blog.bigballofwax.co.nz/2010/12/14/what-happens-after-updating-the-st... But like Tom Robbins said "It is a grave and dangerous mistake to take oneself too seriously" :) Chris
participants (2)
-
Chris Cormack -
Stacy Pober