Does anyone know if switching from Zebra to Solr will let Koha libraries use a stopwords list? Also, will Solr have any effect on fuzzy searching defaults? Lastly, is there anyone sponsoring or working on a "did you mean..." opac response to zero retrieval searches? At our library, we'd probably choose a "did you mean..." spelling suggestion choice over the automatic fuzzy spelling assumptions that are currently in the system. I realize this would probably be listed in bugzilla, but I'm not sure exactly how to search for this. -- Stacy Pober Information Alchemist Manhattan College Library Riverdale, NY 10471 stacy.pober@manhattan.edu
On 25 July 2012 08:15, Stacy Pober <stacy.pober@manhattan.edu> wrote:
Does anyone know if switching from Zebra to Solr will let Koha libraries use a stopwords list?
Also, will Solr have any effect on fuzzy searching defaults?
Lastly, is there anyone sponsoring or working on a "did you mean..." opac response to zero retrieval searches? At our library, we'd probably choose a "did you mean..." spelling suggestion choice over the automatic fuzzy spelling assumptions that are currently in the system. I realize this would probably be listed in bugzilla, but I'm not sure exactly how to search for this.
Stacy Koha is not switching to Solr, but now (in master) you can choose between using Solr or Zebra. (YMMV with other software based originally on Koha). This is already working in the master branch. Also, in Koha itself using zebra (I don't know what the forks are doing) you can now use DOM indexing which is much more powerful than the old indexing methods. But we want to do much more than that, if you look at the thread starting here http://lists.katipo.co.nz/pipermail/koha/2012-July/033634.html You will see what the future plans are. (This does include did you mean). Hope this helps Chris
Stacy, I'll address the points that are not answered in the proposal that Brooke linked to. Does anyone know if switching from Zebra to Solr will let Koha
libraries use a stopwords list?
Solr allows the use of stopwords (as would a decent query parser such as the one I propose writing). However, the Solr code in Koha right now does not make use of the stopwords feature. To my mind, that is a good thing. If we used stopwords, the poetry journal The would be unfindable (and yes, there is such a journal: I had a nightmare and a half trying to find the record when I had an issue to catalog at the NYPL; thankfully their catalog doesn't throw away stopwords anymore). And things would be even worse when searching for French books. Consider the case of the À thé and Le thé. In the US we would probably search for "a the" and "le the." If someone can
Also, will Solr have any effect on fuzzy searching defaults?
It will. Fuzzy searching has completely different semantics in Solr compared to Zebra. We briefly noted that fact in the proposal, along with a footnote identifying the algorithms that Solr uses for fuzzy searching (and, yes, I am aware that the second algorithm listed is generally used as an alternate name for the first... I have no explanation of why the Solr docs used the two names like they were different). Whether the "fuzzy" behavior is closer to what you want I could not say. My personal preference, like yours, is to not be fuzzy, and just suggest better searches. Lastly, is there anyone sponsoring or working on a "did you mean..."
opac response to zero retrieval searches? At our library, we'd probably choose a "did you mean..." spelling suggestion choice over the automatic fuzzy spelling assumptions that are currently in the system. I realize this would probably be listed in bugzilla, but I'm not sure exactly how to search for this.
We have not yet added bugs for the various parts of the search rewrite. Regards, Jared -- Jared Camins-Esakov Bibliographer, C & P Bibliography Services, LLC (phone) +1 (917) 727-3445 (e-mail) jcamins@cpbibliography.com (web) http://www.cpbibliography.com/
Stacy,
Solr allows the use of stopwords (as would a decent query parser such as the one I propose writing). However, the Solr code in Koha right now does not make use of the stopwords feature. To my mind, that is a good thing. If we used stopwords, the poetry journal The would be unfindable (and yes, there is such a journal: I had a nightmare and a half trying to find the record when I had an issue to catalog at the NYPL; thankfully their catalog doesn't throw away stopwords anymore). And things would be even worse when searching for French books. Consider the case of the À thé and Le thé. In the US we would probably search for "a the" and "le the." If someone can
Whoops, I accidentally deleted several sentences when I hit send. Picking up where I left off: If someone can make a compelling case for stopwords, we could, of course, add their use to the proposal as an optional feature. That said, it is my opinion that any catalog that requires stopwords in order to offer relevant results is broken. Relevance ranking should take into account that a given word in a query is statistically overrepresented in the results, and therefore should be considered less relevant than other words in the query. Regards, Jared -- Jared Camins-Esakov Bibliographer, C & P Bibliography Services, LLC (phone) +1 (917) 727-3445 (e-mail) jcamins@cpbibliography.com (web) http://www.cpbibliography.com/
participants (3)
-
Chris Cormack -
Jared Camins-Esakov -
Stacy Pober