[Koha] MARC Support
Tonnesen Steve
tonnesen at cmsd.bc.ca
Thu Jun 21 07:01:06 NZST 2001
I never realized what a hassle it is to sort lists where numbers come
after letters (as is the norm with MARC subfield marks). MySQL will only
sort with numbers before the letters. I've managed to get the MARC
records sorting in the right order now, and it can optionally display both
tag names and subfield names for those of us for whom MARC is somewhat
greek.
When adding new subfields, there is a drop down box with subfield marks
and names, or a simple text entry box where you can just type in the
subfield mark.
Now I really need to start thinking about how to tie this in to Koha. I
don't think it should be too difficult to generate MARC records from the
existing biblio, biblioitems, items table data. I suppose the easiest
thing to do would be to insert some kind of API everywhere that modifies
those tables so that they also modify the MARC records. This would be a
first step towards adding MARC support. Second step would be inserting an
API for every routine that _reads_ from those tables. At that point, all
reads could gather data from the MARC records, and all writes could update
both the old tables and the new MARC tables.
MARC records would be linked at the biblioitems level. The items table
data would, I believe, mostly be recorded in an 852 - Location tag in the
Marc data. Links to the biblio table could be stored in a local 9XX tag.
For example, if there is a biblio entry for "King Lear" with biblionumber
5214, and there are 4 biblioitems for that biblio (hardcover, paberback,
cassette, large print), the MARC records for each of those four would have
a tag like:
944 ## a 5214
When doing searches on MARC records, all four results with the same 944 a
value could be lumped together under "King Lear" in the results output. I
believe this would result in the same behaviour obtained with the three
split tables?
------------------------
Technical Stuff Below!!!
------------------------
Rough outline of the Schema I used:
I used the schema developed for the Pytheas project. It is fully capable
of storing all of the information in any MARC record, and is not overly
difficult to search. One feature I'd like to add is a keyword index so
that soundex searches would work (ie soundex(elefant) =
soundex(elephant)).
The Resource_Table contains an entry for every MARC record.
Resource_Table
Resource_ID int auto_increment
Record_ID int
The Bib_Table contains an entry for every tag in a record. The Record_ID
is used to link all tags in a MARC record together. Each entry in
Bib_Table will have one of the Tag_#XX_ID fields filled out with a number
that links to the appropriate #XX_Tag_Table. The Control_ID field links
to a Control_Table that is used for the fixed length data elements like
008.
Bib_Table
Bib_ID int auto_increment
Record_ID int
Tag_0XX_ID int
Tag_1XX_ID int
Tag_2XX_ID int
Tag_3XX_ID int
Tag_4XX_ID int
Tag_5XX_ID int
Tag_6XX_ID int
Tag_7XX_ID int
Tag_8XX_ID int
Tag_9XX_ID int
Control_ID int
The 2XX_Tag_Table is representative of 10 tables (0XX_Tag_Table through
9XX_Tag_Table). It contains one entry for each tag of a record. The
Tag_ID value is linked back to the Bib_Table Tag_2XX_ID field. The
Subfield_ID is used to link to the 2XX_Subfield_Table which contains all
of the subfields for this tag.
2XX_Tag_Table
Tag_Key int auto_increment
Tag_ID int
Indicator1 int
Indicator2 int
Tag char(3)
Subfield_ID int
The 2XX_Subfield_Table is again representative of 10 such tables. The
Subfield_ID is used to link all of the subfields of a given tag together
and back to the 2XX_Tag_Table entry. Small values can be stored directly
in the Subfield_Value field. Larger values are stored in separate blob
tables using the Storage_ID field as the link.
2XX_Subfield_Table
Subfield_Key int auto_increment
Subfield_ID int
Subfield_Mark char(1)
Subfield_Value char(255)
Storage_ID int
The Storage_Table and *Blob_Table is where the data elements themselves
are stored. Depending on the size of the data, they can be stored in the
Blob_Table, Med_Blob_Table, or the Long_Blob_Table, based on the value of
Storage_Type.
Storage_Table
Storage_Key int auto_increment
Storage_ID int
Blob_ID int
Med_Blob_ID int
Long_Blob_ID int
Storage_Type enum('B', 'MB', 'LB');
The Med_Blob_Table and Long_Blob_Table are identical to Blob_Table except
that the Blob_Data field is mediumblob or longblob respectively. The
Seq_No field can be used to break a huge record into multiple longblob's
if necessary.
Blob_Table
Blob_Key int auto_increment
Blob_ID int
Blob_Data blob
Seq_No int
If I wanted to find the title and author of all records that had "bear" in
any subject (6XX) tag, I would use:
select B.Record_ID
from Bib_Table B, 6XX_Tag_Table T, 6XX_Subfield_Table S
where B.Tag_6XX_ID=T.Tag_ID
and T.Subfield_ID=S.Subfield_ID
and S.Subfield_Mark='a'
and S.Subfield_Value like '%bear%';
for each Record_ID returned, run the following to get the author:
select S.Subfield_Value
from Bib_Table B, 1XX_Tag_Table T, 1XX_Subfield_Table S
where B.Record_ID=$Record_ID
and B.Tag_1XX_ID=T.Tag_ID
and T.Subfield_ID=S.Subfield_ID
and T.Tag='100';
and the following to get the title:
select S.Subfield_Value
from Bib_Table B, 2XX_Tag_Table T, 2XX_Subfield_Table S
where B.Record_ID=$Record_ID
and B.Tag_2XX_ID=T.Tag_ID
and T.Subfield_ID=S.Subfield_ID
and T.Tag='245';
More information about the Koha
mailing list