[Koha] MARC Support

Thu Jun 21 07:01:06 NZST 2001

I never realized what a hassle it is to sort lists where numbers come
after letters (as is the norm with MARC subfield marks).  MySQL will only
sort with numbers before the letters.  I've managed to get the MARC
records sorting in the right order now, and it can optionally display both
tag names and subfield names for those of us for whom MARC is somewhat
greek.

When adding new subfields, there is a drop down box with subfield marks
and names, or a simple text entry box where you can just type in the
subfield mark.

Now I really need to start thinking about how to tie this in to Koha.  I
don't think it should be too difficult to generate MARC records from the
existing biblio, biblioitems, items table data.  I suppose the easiest
thing to do would be to insert some kind of API everywhere that modifies
those tables so that they also modify the MARC records.  This would be a
first step towards adding MARC support.  Second step would be inserting an
API for every routine that _reads_ from those tables.  At that point, all
reads could gather data from the MARC records, and all writes could update
both the old tables and the new MARC tables.

MARC records would be linked at the biblioitems level.  The items table
data would, I believe, mostly be recorded in an 852 - Location tag in the
Marc data.  Links to the biblio table could be stored in a local 9XX tag. 
For example, if there is a biblio entry for "King Lear" with biblionumber
5214, and there are 4 biblioitems for that biblio (hardcover, paberback,
cassette, large print), the MARC records for each of those four would have
a tag like:

944 ## a 5214

When doing searches on MARC records, all four results with the same 944 a
value could be lumped together under "King Lear" in the results output.  I
believe this would result in the same behaviour obtained with the three
split tables?

------------------------
Technical Stuff Below!!!
------------------------

Rough outline of the Schema I used:

I used the schema developed for the Pytheas project.  It is fully capable
of storing all of the information in any MARC record, and is not overly
difficult to search.  One feature I'd like to add is a keyword index so
that soundex searches would work (ie soundex(elefant) =
soundex(elephant)). 

The Resource_Table contains an entry for every MARC record.

Resource_Table
   Resource_ID  int auto_increment
   Record_ID    int

The Bib_Table contains an entry for every tag in a record.  The Record_ID
is used to link all tags in a MARC record together.  Each entry in
Bib_Table will have one of the Tag_#XX_ID fields filled out with a number
that links to the appropriate #XX_Tag_Table.  The Control_ID field links
to a Control_Table that is used for the fixed length data elements like
008. 

Bib_Table
   Bib_ID     int auto_increment
   Record_ID  int
   Tag_0XX_ID int
   Tag_1XX_ID int
   Tag_2XX_ID int
   Tag_3XX_ID int
   Tag_4XX_ID int
   Tag_5XX_ID int
   Tag_6XX_ID int
   Tag_7XX_ID int
   Tag_8XX_ID int
   Tag_9XX_ID int
   Control_ID int

The 2XX_Tag_Table is representative of 10 tables (0XX_Tag_Table through
9XX_Tag_Table).  It contains one entry for each tag of a record.  The
Tag_ID value is linked back to the Bib_Table Tag_2XX_ID field. The
Subfield_ID is used to link to the 2XX_Subfield_Table which contains all
of the subfields for this tag. 

2XX_Tag_Table
   Tag_Key     int auto_increment
   Tag_ID      int
   Indicator1  int
   Indicator2  int
   Tag         char(3)
   Subfield_ID int

The 2XX_Subfield_Table is again representative of 10 such tables.  The
Subfield_ID is used to link all of the subfields of a given tag together
and back to the 2XX_Tag_Table entry.  Small values can be stored directly
in the Subfield_Value field.  Larger values are stored in separate blob
tables using the Storage_ID field as the link.

2XX_Subfield_Table
   Subfield_Key   int auto_increment
   Subfield_ID    int
   Subfield_Mark  char(1)
   Subfield_Value char(255)
   Storage_ID     int

The Storage_Table and *Blob_Table is where the data elements themselves
are stored.  Depending on the size of the data, they can be stored in the
Blob_Table, Med_Blob_Table, or the Long_Blob_Table, based on the value of
Storage_Type. 

Storage_Table
   Storage_Key   int auto_increment
   Storage_ID    int
   Blob_ID       int
   Med_Blob_ID   int
   Long_Blob_ID  int
   Storage_Type  enum('B', 'MB', 'LB');

The Med_Blob_Table and Long_Blob_Table are identical to Blob_Table except
that the Blob_Data field is mediumblob or longblob respectively.  The
Seq_No field can be used to break a huge record into multiple longblob's
if necessary.

Blob_Table
   Blob_Key	int auto_increment
   Blob_ID	int
   Blob_Data    blob
   Seq_No	int

If I wanted to find the title and author of all records that had "bear" in
any subject (6XX) tag, I would use:

  select B.Record_ID
  from Bib_Table B, 6XX_Tag_Table T, 6XX_Subfield_Table S
  where B.Tag_6XX_ID=T.Tag_ID
    and T.Subfield_ID=S.Subfield_ID
    and S.Subfield_Mark='a'
    and S.Subfield_Value like '%bear%';

for each Record_ID returned, run the following to get the author:

  select S.Subfield_Value
  from Bib_Table B, 1XX_Tag_Table T, 1XX_Subfield_Table S
  where B.Record_ID=$Record_ID
    and B.Tag_1XX_ID=T.Tag_ID
    and T.Subfield_ID=S.Subfield_ID
    and T.Tag='100';

and the following to get the title:

  select S.Subfield_Value
  from Bib_Table B, 2XX_Tag_Table T, 2XX_Subfield_Table S
  where B.Record_ID=$Record_ID
    and B.Tag_2XX_ID=T.Tag_ID
    and T.Subfield_ID=S.Subfield_ID
    and T.Tag='245';