[Koha] Batch import-updating of existing biblios

Wed Feb 23 08:32:52 NZDT 2011

On Wed, Feb 23, 2011 at 1:50 AM,  <hansbkk at gmail.com> wrote:

> should this level of detail be reflected in the documentation? I'd be
> happy to help if it's accessible - wiki?
>
> http://lists.koha-community.org/pipermail/koha-patches/2010-October/012789.html

Turns out the above is something different, having to do with
duplicate items in a list (virtual shelf) ??

I found this in the docs

http://koha-community.org/documentation/3-2-manual/?ch=x4484#AEN4901

Looks like it's also somewhat relevant, could use some filling-in, so
here are my ideas - what's the proper channel for docs input?

----------------------------------------------

      Choose a unique name and enter it in the 'Matching rule code' field

      'Description' can be anything you want to make it clear to you
what rule you're picking

-->      'Match threshold' - Choose a number so that a sufficient
number of "match points" from below will trigger a higher total - if
you only have one match point, make this number that point's "Score"
minus 1

      Match points are set up to determine what fields to match on

      'Search index' can be found by looking at the ccl.properties
file on your system which tells the zebra indexing what data to search
for in the MARC data"

--> above needs the name/location of the file

-->    'Score' - Choose a number to determine this point's
contribution to the total toward triggering a match. If you only have
one match point, make it the above "match threshold" +1

      Enter the MARC tag you want to match on in the 'Tag' field

-->      Enter the MARC tag subfield you want to match on in the
'Subfields' field, *or*

-->      'Offset' - for fixed MARC subfields that use character-count
location rather than subfield codes, combined with

-->      'Length' - the number of characters to count from the offset

      Koha currently has only one 'Normalization rule' that removes
extra characters such as commas and semicolons. The value you enter in
this field is irrelevant to the normalization process.

      'Required match checks' - ??

Then I try the online help, which it turns out explains things pretty
well - excellent!

Which brings me to a "big picture" question - wouldn't it be a good
idea to have the online help (within the Koha interface) and the
website's documentation coordinated. Even if it's basically
duplicated, better than having them maintained separately, bits
missing from each or maybe getting out of sync?

----------------------------------------------
Record Matching Rules

IMPORTANT: This is an advanced feature and should not be altered
without knowing how it will effect data migration.

Use this tool to create rules to apply during the data migration
process. It will prevent duplicates from coming into the system when
importing MARC records. An import rule or matching rule consists of
one or more 'match points' and zero or more 'match checks'. Each match
point specifies a 'search index' and a MARC 'tag', 'subfield', or
'length' (fixed field position) when a record is imported. For each
match point, a string is constructed from the tag specified in the
match point and the related index is searched.

The set of matching records are assigned a score (the value of which
is determined by the match point rule). Then, the rest of the match
points are considered and the scores of each set of matches is added
up. The set of matching records whose total score is over a threshold
value defined in the matching rule are candidate matches.

Match checks are applied for all candidate matches. Each match check
specifies a tag in the incoming record and a tag in the possible
matching record. The values must be the same for a match to be
considered good (e.g., doing a match check on title, or publication
date).

----------------------------------------------

And finally, does this functionality kick in when bringing records
into the reservoir, or from there into the database itself?