[Koha] Identifying Records with Non-Roman Characters

Wed Jun 9 10:39:07 NZST 2021

Hello, David!

    In our latest exchange on 8 June 2021 at 13:33 [JST], I received from
you the following:

I'm not 100% sure what you're asking here. Are you asking to find all
> records where there is a 245 title that isn't Romanized?
>
> You could try something like this:
> SELECT *
>   FROM biblio
>  WHERE title <> CONVERT(title USING latin1);
>
> I've tried that out on one of my multilingual libraries and it had some
> decent results.
>
> However, it's worth noting that it isn't a perfect solution. There are
> certain characters (I've noticed a particular hyphen) that won't have a
> latin1 equivalent, but they'd still be "Roman" (like hyphen which is made
> of the hexadecimal bytes E2 80 90, but is not to be confused with
> hyphen-minus which is ASCII and represented by the byte 2D). Other examples
> are emoji and other symbols like ↔️.
>
> You could then tweak the query or do a visual scan through to filter out
> any results that are irrelevant.
>
> Anyway, I hope that helps you advance your work.
>

    Yes, I'm trying to get the bibliographic records whose 245 field is not
in the Latin alphabet. The 245 field in the non-Latin alphabet is
immediately followed by a 246 in Latin transliteration.

    This solution may just work, even if it generates false positives and
even if it reduces the count to 6500 records instead of the current 8766.
Then I can carefully automate the process of changing 245 tag to 880 with
$6 and the first 246 tag to 245 with $6.

    Thank you very much.

-- 

    気を付けて。 /ki wo tukete/ = Take care.

    -- Charles.

    Charles Kelley, MLS
    PSC 704 Box 1029
    APO AP 96338

    Charles Kelley
    Tsukimino 1-Chome 5-2
    Tsukimino Gaadenia #210
    Yamato-shi, Kanagawa-ken
    〒242-0002 JAPAN

    +1-301-741-7122 [US cell]
    +81-80-4356-2178 [JPN cell]

    mnogojazyk at aol.com [h]
    cmkelleymls at gmail.com [p]

    linkedin.com/in/cmkelleymls <http://www.linkedin.com/in/cmkelleymls>
    Meeting Your Information Needs. Virtually.