[Koha] Hindi books

Paul Hoffman paul at flo.org
Sat Jun 2 21:58:35 NZST 2018


On Fri, Jun 01, 2018 at 01:28:29PM +0530, takan bhatt wrote:
> We have huge collection of Hindi, Gujarati and Sanskrit Books. i want to
> catalog it in Roman Script only. But along with that we would like to
> provide them facilities to search in Hindi or Gujarati script so what are
> the solution in koha

I'm not an expert, but this seems difficult.  I believe the only 
feasible way to do this is to transliterate the search terms the user 
provides from Nagari to Roman script before building the query that 
Zebra performs.  (I presume you're using Zebra rather than 
Elasticsearch.)

I would put the custom code in C4::Search::buildQuery, which (in our 
Koha instance) would be around line 1545 of the file /usr/share/lib/

    1531  # Form-based queries are non-nested and fixed depth, so we can easily modify the incoming
    1532  # query operands and indexes and add stemming, truncation, field weighting, etc.
    1533  # Once we do so, we'll end up with a value in $query, just like if we had an
    1534  # incoming $query from the user
    1535      else {
    1536          $query = ""
    1537            ; # clear it out so we can populate properly with field-weighted, stemmed, etc. query
    1538          my $previous_operand
    1539            ;    # a flag used to keep track if there was a previous query
    1540                 # if there was, we can apply the current operator
    1541                 # for every operand
    1542          for ( my $i = 0 ; $i <= @operands ; $i++ ) {
    1543
    1544              # COMBINE OPERANDS, INDEXES AND OPERATORS
    1545              if ( $operands[$i] ) {
    1546                  $operands[$i]=~s/^\s+//;
    1547
    1548                # A flag to determine whether or not to add the index to the query
    1549                  my $indexes_set;


I would start by adding a line like this between lines 1545 and 1546:

    $operands[$i] = _transliterate_nagari($operands[$i]);

Write the function _transliterate_nagari and put it somewhere convenient 
-- perhaps at the end of /usr/share/koha/lib/C4/Search.pm:

    sub _transliterate_nagari {
        local $_ = shift;
        s/(\p{Script:Gujarati}+)/_transliterate_gujarati($1)/xg;
        s/(\p{Script:Devanagari}+)/_transliterate_devanagari($1)/xg;
        return $_;
    }

(I don't think that's quite right -- I've never used \p{Script:foo} before.)

Then figure out how to do the actual transliteration and write these 
functions:

    sub _transliterate_gujarati {
        ...
    }
    sub _transliterate_devanagari {
        ...
    }
    
Perl modules you might use for transliteration include Lingua::Translit 
and Lingua::Deva:

https://metacpan.org/pod/Lingua::Translit
https://metacpan.org/pod/Lingua::Deva

You'll need to use the same transliteration scheme when cataloging, of 
course, or the transliterated search terms won't match the terms in the 
Zebra index.  Are you planning to use IAST?

https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration

Good luck!!

Paul.

-- 
Paul Hoffman <paul at flo.org>
Systems Librarian
Fenway Libraries Online
c/o Wentworth Institute of Technology
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)


More information about the Koha mailing list