[Koha] Koha Search Indexes

David Cook dcook at prosentient.com.au
Tue Feb 2 11:54:02 NZDT 2016


Hi Nicole,

I keep meaning to look over and revise the search documentation, but I always seem preoccupied with other work.

I'm not sure whether or not the list at http://manual.koha-community.org/3.24/en/kohasearchindexes.html is complete at a glance. To be honest, while I think it's a valuable list, I think it would be more valuable for end users to have a list of CCL qualifiers (and their corresponding registers). While an index may exist in Zebra, it's the CCL qualifier that the user needs to know in order to access it, and sometimes the qualifier is difference than the index name.

There are 3 vital files for Zebra indexing and Koha searching:
bib1.att
biblio-zebra-indexdefs.xsl
ccl.properties

bib1.att defines which indexes may exist.
Biblio-zebra-indexdefs.xsl decides what MARC data goes into which indexes.
ccl.properties provides a query language for accessing those indexes through search queries.

Paul asked about the suffixes :n, :p, :w, :u, and :s. These are called "registers". :n is numeric, :p is phrase, :w is word, :u is URL, and :s is sorting.

Different types of CCL qualifiers allow us to access different types of registers. "st-numeric" provides access to the :n register. "st-phrase" and "phr" access :p. "st-word", "st-word-list", and "wrdl" access ":w", "st-urx" accesses :u, and generally we don't need to access :s when searching as that's a behind-the-scenes thing for Koha to worry about.

Different registers have different normalization rules. 

If we look at biblio-zebra-indexdefs.xsl, we can see that MARC 245 is indexed into Title:w and Title:p. That means "Harry Potter and the Philosopher's Stone" would be indexed something like so:

<title:w>Harry</title:w>
<title:w>Potter</title:w>
<title:w>Phllosopher's</title:w>
<title:w>Stone</title:w>
<title:p>Harry Potter and the Philosopher's Stone</title:p>

So if we did a search like... "title,wrdl=Harry", we'd get a hit for that MARC record. If we did a search like 'title,phr="Harry Potter and the Philosopher's Stone"', we'd get a hit for that MARC record.

I'll draw your attention now to 952$u. It's indexed as uri:u (although it would also show up in the Any:w and Any:p keyword indexes). In order to access uri:u, we'd need to search for 'uri,st-urx="http://koha-community.org"'. The "st-urx" maps to the ":u", and we see "uri" in "ccl.properties" which maps to "uri" in bib1.att. 

If we tried to do a search for 'uri,wrdl="http://koha-community.org"', it would fail, because nothing is indexed in the "uri:w" index:register combo. 

I have to run to an appointment, but hopefully that helps a bit.

One day, I'd like to write a program which parses ccl.properties to provide a list of qualifies that cross-references with biblio-zebra-indexdefs.xsl to see which registers are available for which qualifier/index pair. The register system is a bit complicated but it can be useful. I've recently started doing more with the ":u" register...

David Cook
Systems Librarian
Prosentient Systems
72/330 Wattle St, Ultimo, NSW 2007


> -----Original Message-----
> Date: Mon, 1 Feb 2016 11:21:01 -0600
> From: Nicole Engard <nengard at gmail.com>
> To: Paul A <paul.a at navalmarinearchive.com>
> Cc: Koha <Koha at lists.katipo.co.nz>, koha-docs at lists.koha-community.org
> Subject: Re: [Koha] Koha Search Indexes
> Message-ID:
> 	<CAC0K6VHYbER_h2B8pXmVRu8Q9=3rLVaqzOJb=SKiHPTMsYagfg@
> mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> I'd love if someone could help me there.
> 
> Nicole
> 
> On Mon, Feb 1, 2016 at 10:35 AM, Paul A <paul.a at navalmarinearchive.com>
> wrote:
> 
> > At 08:42 AM 2/1/2016 -0600, Nicole Engard wrote:
> >
> >> Hello all,
> >>
> >> I updated this page in the manual
> >> http://manual.koha-community.org/3.24/en/kohasearchindexes.html
> because
> >> there were many indexes not listed in there.  Would index experts please
> >> take a look to make sure I caught all of the indexes?
> >>
> >
> > Many thanks. Could you possible add an explanation of the suffixes :n, :p,
> > :w, :u (URL?) and :s?
> >
> > Tnx and br -- Paul
> >
> > ---
> > Maritime heritage and history, preservation and conservation,
> > research and education through the written word and the arts.
> > <http://NavalMarineArchive.com> and <http://UltraMarine.ca>
> >
> > _______________________________________________
> > Koha mailing list  http://koha-community.org
> > Koha at lists.katipo.co.nz
> > https://lists.katipo.co.nz/mailman/listinfo/koha





More information about the Koha mailing list