[Koha] Multiple bulk imports

Elaine Bradtke eb at efdss.org
Wed Sep 29 23:19:55 NZDT 2010


Thanks Ian, very helpful. .  a few more questions though.

On Tue, Sep 28, 2010 at 5:51 PM, Ian Bays <ian.bays at ptfs-europe.com> wrote:
>  Hi Elaine,
>
> You will need a field to match on.  If ISBN does not work for you then
> some sort of control number should do.  I often use a control number in
> the 001 and set up a matching rule on that field.
> If you have no match field it becomes a manual job and others might have
> a better idea how to merge two bib records while keeping items from both
> records along with any other associated loan data etc...

That's what I was afraid of.  It may have to be a manual job.  No
ISBNs and the 001 control numbers will be different, because they're
coming from two different databases.  The way we find duplicates  is a
combination of author, title and publication information. A big chunk
of the first collection was published before 1900, and even the dates
are estimates a lot of the time.

Luckily, the first collection is reference only, so there is no loan
data to worry about there. A very small percentage of items from the
second collection are available for loan.

Would this work? For the second collection, the one which will have
overlapping items, I could combine it with the first file and
de-duplicate offline.  If I transfer all the duplicate item
information from the second batch to the record with the control
number from the first batch, would it overwrite the biblio records in
Koha?
Or, would I have to clear out the Koha database and import the new
combined one.  The first batch of material does not circulate, so
there won't be any problems with items out on loan.
After that, the subsequent collections won't have overlapping items,
as they're different formats.  Phew.

> Remember that the biblio matching uses the zebra search indexes so the
> first batch must be loaded and indexed before loading the second batch.

There will probably be a gap of a few months between imports.
>
> Also (I recall someone had this issue) if you have a batch which itself
> has duplicates by (say) ISBN and you try to load it on an empty
> database, and you want it to de-duplicate the bibs and add the items,
> that won't work.  This is because the matching looks in the zebra
> indexes which are not built at that point.  There are ways to overcome
> this (extract a unique list of ISBNs from the data and make minimal bib
> records with just ISBN, build zebra indexes then load the real data
> matching on ISBN).
>
> Hope that makes sense...
>
> Ian
> On 28/09/2010 17:19, Elaine Bradtke wrote:
>> Has anyone done their bulk imports in batches?  We've got different
>> collections that are in separate databases.  I was planning to import
>> each collection separately, because there are slightly different
>> things that have to be done to the data in each collection.
>> But there's a catch (isn't there always).  There are duplicate copies
>> of some the same items across the collections.  It seems logical that
>> they should share the same biblio, and have the location, and
>> collection information at the item level. How does this work if you
>> are importing the different collections in sequence?  A lot of our
>> stuff pre-dates ISBN so even identifying them may be a trick.
>> However, I was wondering how / if anyone has dealt with this problem.
>>
>
>
> --
> Ian Bays
> Director of Projects
> PTFS Europe.com
> mobile: +44 (0) 7774995297
> phone: +44 (0) 800 756 6803
> skype: ian.bays
> email: ian.bays at ptfs-europe.com
>
> _______________________________________________
> Koha mailing list
> Koha at lists.katipo.co.nz
> http://lists.katipo.co.nz/mailman/listinfo/koha
>



-- 
Elaine Bradtke
Data Wrangler
VWML
English Folk Dance and Song Society | http://www.efdss.org
Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY
Tel    +44 (0) 20 7485 2206 ext 36
Mob +44 (0) 7789 373982
--------------------------------------------------------------------------
Registered Company No. 297142
Charity Registered in England and Wales No. 305999
---------------------------------------------------------------------------
"Writing about music is like dancing about architecture"
--Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)


More information about the Koha mailing list