[Koha] Hindi books
Paul Hoffman
paul at flo.org
Sat Jun 2 21:58:35 NZST 2018
On Fri, Jun 01, 2018 at 01:28:29PM +0530, takan bhatt wrote:
> We have huge collection of Hindi, Gujarati and Sanskrit Books. i want to
> catalog it in Roman Script only. But along with that we would like to
> provide them facilities to search in Hindi or Gujarati script so what are
> the solution in koha
I'm not an expert, but this seems difficult. I believe the only
feasible way to do this is to transliterate the search terms the user
provides from Nagari to Roman script before building the query that
Zebra performs. (I presume you're using Zebra rather than
Elasticsearch.)
I would put the custom code in C4::Search::buildQuery, which (in our
Koha instance) would be around line 1545 of the file /usr/share/lib/
1531 # Form-based queries are non-nested and fixed depth, so we can easily modify the incoming
1532 # query operands and indexes and add stemming, truncation, field weighting, etc.
1533 # Once we do so, we'll end up with a value in $query, just like if we had an
1534 # incoming $query from the user
1535 else {
1536 $query = ""
1537 ; # clear it out so we can populate properly with field-weighted, stemmed, etc. query
1538 my $previous_operand
1539 ; # a flag used to keep track if there was a previous query
1540 # if there was, we can apply the current operator
1541 # for every operand
1542 for ( my $i = 0 ; $i <= @operands ; $i++ ) {
1543
1544 # COMBINE OPERANDS, INDEXES AND OPERATORS
1545 if ( $operands[$i] ) {
1546 $operands[$i]=~s/^\s+//;
1547
1548 # A flag to determine whether or not to add the index to the query
1549 my $indexes_set;
I would start by adding a line like this between lines 1545 and 1546:
$operands[$i] = _transliterate_nagari($operands[$i]);
Write the function _transliterate_nagari and put it somewhere convenient
-- perhaps at the end of /usr/share/koha/lib/C4/Search.pm:
sub _transliterate_nagari {
local $_ = shift;
s/(\p{Script:Gujarati}+)/_transliterate_gujarati($1)/xg;
s/(\p{Script:Devanagari}+)/_transliterate_devanagari($1)/xg;
return $_;
}
(I don't think that's quite right -- I've never used \p{Script:foo} before.)
Then figure out how to do the actual transliteration and write these
functions:
sub _transliterate_gujarati {
...
}
sub _transliterate_devanagari {
...
}
Perl modules you might use for transliteration include Lingua::Translit
and Lingua::Deva:
https://metacpan.org/pod/Lingua::Translit
https://metacpan.org/pod/Lingua::Deva
You'll need to use the same transliteration scheme when cataloging, of
course, or the transliterated search terms won't match the terms in the
Zebra index. Are you planning to use IAST?
https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration
Good luck!!
Paul.
--
Paul Hoffman <paul at flo.org>
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)
More information about the Koha
mailing list