[Koha] Indic language conjunct clusters printed incorrectly in spine label

Sat Aug 12 23:26:53 NZST 2017

[CROSS-POSTING]

Hi all,

This is w.r.t. https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=19084
(there is a screenshot on the bug showing the problem)

This problem seems to be present for most Indian languages whenever
they have conjunct clusters in their call numbers (depicted as
grapheme clusters in an unicoded string).

To describe the problem simply - the order of chars rendered is
incorrect in the output. For example the string - "শেখর" is
represented by the following code points -
\x{09B6}\x{09C7}\x{0996}\x{09B0}.

Now here is the catch: \x{09B6} represents the bengali letter SHA,
whereas \x{09C7} represents the bengali vowel sign E; however in the
correct linguistic visual presentation, the vowel sign E sits before
the SHA, which is not how the codepoints are arranged in the unicode
string.

I looked around PDF::Reuse, Text::PDF::TTFont etc modules, what seems
to me to be the root of this problem is the unpacku() method which is
pushing the unicode characters into an array in order to introduce
them into the PDF content stream with the correct font information.
However, being pushed in in that order, I think may be the cause of
this problem, which would make this an upstream issue rather than a
Koha bug.

Your inputs / feedback on this would be greatly appreciated.

cheers
indranil

PS. I am aware of the "CLOSED WONTFIX" status of bug id 2246 ;-)

--
Indranil Das Gupta
L2C2 Technologies

Phone : +91-98300-20971
Blog    : http://blog.l2c2.co.in
IRC     : indradg on irc://irc.freenode.net
Twitter : indradg