Since 2.4, Koha added Lingua::Ispell as a dependency, so if we're really just looking to make misspellings into valid strings in language X, the easiest implementation might be to use Ispell.
I like the idea of building target strings out of the MARC data, but that might be better addressed by using a script to extend the Ispell dictionary. The original mechanism for this, which Brendan is looking at, took phrases into consideration as well, not just single words. Would ispell be able to handle
On Wed, Sep 3, 2008 at 4:52 PM, Joe Atzberger <ohiocore@gmail.com> wrote: phrases? Josh
--joe
On Wed, Sep 3, 2008 at 2:47 PM, Brendan Gallagher <gallabr@biblio.org> wrote:
Thanks for the lead Jesse -- I'll continue to mess around with these and see what I can develop. -Brendan On Sep 3, 2008, at 2:44 PM, Jesse Weaver wrote:
On Wed, Sep 3, 2008 at 12:25 PM, Brendan Gallagher <gallabr@biblio.org> wrote:
Hi All -
I've been snooping around in the make_spellcheck_suggest.pl perl script and I've developed a few questions.
Looks like the script was originally written for Koha 2.4 CVS and hasn't quite been updated yet. I've been working my way through updating the script (I'm on perl 5.10 and koha 3.1).
Ok, my question is (or if someone has a better recommendation of a different path I should be chasing or working towards developing)
Could someone lead me to the new mysql database tables for marc_word (or another equivalent) and correct me if I am wrong but I am equating "marc_subfield_table" with the newer mysql table marc_subfield_structure (plus i need to change subfieldvalue to either tagfield or tagsubfield). Below is the current part of the script that I am referencing,
marc_subfield_structure is present in both versions, and is different from _table. It holds framework information (description of MARC subfields, and what they mean, like the title or author), rather than the values of those fields for any given record.
" my $query_words = "SELECT DISTINCT word, COUNT(word) FROM marc_word"; my $query_marc_subfields = "SELECT DISTINCT subfieldvalue, COUNT(subfieldvalue) FROM marc_subfield_table"; my $query_titles = "SELECT DISTINCT title, COUNT(title) FROM biblio GROUP BY title"; my $query_authors = "SELECT DISTINCT author, COUNT(author) FROM biblio GROUP BY author"; "
I do have the fuzzy searching set for my opac --> but I am just looking a little bit more (and some of my searches dealing with these "spellcheck" suggestions are working - you can see that I have got it to populate some information in mysql database).
Here is a copy of the message I get when executing this script. and I want to get ride of the "excute failed" -parts.
Step 1 of 5: Checking to make sure suggest tables exist Use of uninitialized value $_ in pattern match (m//) at make_spellcheck_suggest.pl line 99. All tables present ... moving along Step 2 of 5: Deleting old data Step 3 of 5: Creating non-distinct table from various Koha tables Finished building marc_word list Adding marc_word entries with the following tagsubfields:020a, 100a, 110a, 130a, 240a, 245a, 245b, 245c, 245p, 246a, 246b, 440a, 440p, 505t, 511a, 534a, 600a, 610a, 611a, 630a, 650a, 651a, 700a, 710a, 730a, 740a, 800a, 830a, DBD::mysql::st execute failed: Unknown column 'tagsubfield' in 'where clause' at make_spellcheck_suggest.pl line 173. DBD::mysql::st fetchrow_array failed: fetch() without execute() at make_spellcheck_suggest.pl line 179. 0 more records added... Finished building marc_subfield_table list Adding marc_subfield_table entries with the following tags and subfields:020, a, 100, a, 110, a, 130, a, 240, a, 245, a, 245, b, 245, c, 245, p, 246, a, 246, b, 440, a, 440, p, 505, t, 511, a, 534, a, 600, a, 610, a, 611, a, 630, a, 650, a, 651, a, 700, a, 710, a, 730, a, 740, a, 800, a, 830, a, DBD::mysql::st execute failed: Table 'koha.marc_subfield_table' doesn't exist at make_spellcheck_suggest.pl line 173. DBD::mysql::st fetchrow_array failed: fetch() without execute() at make_spellcheck_suggest.pl line 179. 0 more records added... 57708 more records added... 83598 more records added... Step 4 of 5: Deleting old distinct entries Step 5 of 5: Creating distinct spellcheck table out of non-distinct table Finished: total distinct items added to spellcheck: 81520
Koha 3.0's new database schema does make this a bit harder; the script would have to parse the record from the biblioitems.marcxml or biblioitems.marc column, then find the words within that to add to the relevant tables.
Thanks,
+++++++++++++++++++++++++++++++++++++++++++ Brendan A. Gallagher Software Services Coordinator Bibliomation, INC. Middlebury, CT 06516 http://www.biblio.org (203)577-4070 x119 +++++++++++++++++++++++++++++++++++++++++++
-- Jesse Weaver Software Developer, LibLime
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Joshua Ferraro SUPPORT FOR OPEN-SOURCE SOFTWARE CEO migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@liblime.com |Full Demos at http://liblime.com/koha |1(888)KohaILS