[Koha] [Koha-devel] Koha Search Indexes

Barton Chittenden barton at bywatersolutions.com
Tue Feb 2 13:02:29 NZDT 2016


On Mon, Feb 1, 2016 at 5:54 PM, David Cook <dcook at prosentient.com.au> wrote:

> Hi Nicole,
>
> I keep meaning to look over and revise the search documentation, but I
> always seem preoccupied with other work.
>
> I'm not sure whether or not the list at
> http://manual.koha-community.org/3.24/en/kohasearchindexes.html is
> complete at a glance. To be honest, while I think it's a valuable list, I
> think it would be more valuable for end users to have a list of CCL
> qualifiers (and their corresponding registers). While an index may exist in
> Zebra, it's the CCL qualifier that the user needs to know in order to
> access it, and sometimes the qualifier is difference than the index name.
>
> There are 3 vital files for Zebra indexing and Koha searching:
> bib1.att
> biblio-zebra-indexdefs.xsl
> ccl.properties
>
> bib1.att defines which indexes may exist.
> Biblio-zebra-indexdefs.xsl decides what MARC data goes into which indexes.
> ccl.properties provides a query language for accessing those indexes
> through search queries.
>
> Paul asked about the suffixes :n, :p, :w, :u, and :s. These are called
> "registers". :n is numeric, :p is phrase, :w is word, :u is URL, and :s is
> sorting.
>
> Different types of CCL qualifiers allow us to access different types of
> registers. "st-numeric" provides access to the :n register. "st-phrase" and
> "phr" access :p. "st-word", "st-word-list", and "wrdl" access ":w",
> "st-urx" accesses :u, and generally we don't need to access :s when
> searching as that's a behind-the-scenes thing for Koha to worry about.
>
> Different registers have different normalization rules.
>
> If we look at biblio-zebra-indexdefs.xsl, we can see that MARC 245 is
> indexed into Title:w and Title:p. That means "Harry Potter and the
> Philosopher's Stone" would be indexed something like so:
>
> <title:w>Harry</title:w>
> <title:w>Potter</title:w>
> <title:w>Phllosopher's</title:w>
> <title:w>Stone</title:w>
> <title:p>Harry Potter and the Philosopher's Stone</title:p>
>
> So if we did a search like... "title,wrdl=Harry", we'd get a hit for that
> MARC record. If we did a search like 'title,phr="Harry Potter and the
> Philosopher's Stone"', we'd get a hit for that MARC record.
>
> I'll draw your attention now to 952$u. It's indexed as uri:u (although it
> would also show up in the Any:w and Any:p keyword indexes). In order to
> access uri:u, we'd need to search for 'uri,st-urx="
> http://koha-community.org"'. The "st-urx" maps to the ":u", and we see
> "uri" in "ccl.properties" which maps to "uri" in bib1.att.
>
> If we tried to do a search for 'uri,wrdl="http://koha-community.org"', it
> would fail, because nothing is indexed in the "uri:w" index:register combo.
>
> I have to run to an appointment, but hopefully that helps a bit.
>
> One day, I'd like to write a program which parses ccl.properties to
> provide a list of qualifies that cross-references with
> biblio-zebra-indexdefs.xsl to see which registers are available for which
> qualifier/index pair. The register system is a bit complicated but it can
> be useful. I've recently started doing more with the ":u" register...
>
>
David,

This isn't *quite* what you describe, but it does set up at least a couple
of the linkages that you'd need:

https://github.com/bywatersolutions/koha-script-zebra-config-report

I think that Marcus Enger wrote something similar, but I don't know if it
fist the current code base, and I don't remember the URL.

--Barton


More information about the Koha mailing list