Thanks Ian, very helpful. . a few more questions though. On Tue, Sep 28, 2010 at 5:51 PM, Ian Bays <ian.bays@ptfs-europe.com> wrote:
Hi Elaine,
You will need a field to match on. If ISBN does not work for you then some sort of control number should do. I often use a control number in the 001 and set up a matching rule on that field. If you have no match field it becomes a manual job and others might have a better idea how to merge two bib records while keeping items from both records along with any other associated loan data etc...
That's what I was afraid of. It may have to be a manual job. No ISBNs and the 001 control numbers will be different, because they're coming from two different databases. The way we find duplicates is a combination of author, title and publication information. A big chunk of the first collection was published before 1900, and even the dates are estimates a lot of the time. Luckily, the first collection is reference only, so there is no loan data to worry about there. A very small percentage of items from the second collection are available for loan. Would this work? For the second collection, the one which will have overlapping items, I could combine it with the first file and de-duplicate offline. If I transfer all the duplicate item information from the second batch to the record with the control number from the first batch, would it overwrite the biblio records in Koha? Or, would I have to clear out the Koha database and import the new combined one. The first batch of material does not circulate, so there won't be any problems with items out on loan. After that, the subsequent collections won't have overlapping items, as they're different formats. Phew.
Remember that the biblio matching uses the zebra search indexes so the first batch must be loaded and indexed before loading the second batch.
There will probably be a gap of a few months between imports.
Also (I recall someone had this issue) if you have a batch which itself has duplicates by (say) ISBN and you try to load it on an empty database, and you want it to de-duplicate the bibs and add the items, that won't work. This is because the matching looks in the zebra indexes which are not built at that point. There are ways to overcome this (extract a unique list of ISBNs from the data and make minimal bib records with just ISBN, build zebra indexes then load the real data matching on ISBN).
Hope that makes sense...
Ian On 28/09/2010 17:19, Elaine Bradtke wrote:
Has anyone done their bulk imports in batches? We've got different collections that are in separate databases. I was planning to import each collection separately, because there are slightly different things that have to be done to the data in each collection. But there's a catch (isn't there always). There are duplicate copies of some the same items across the collections. It seems logical that they should share the same biblio, and have the location, and collection information at the item level. How does this work if you are importing the different collections in sequence? A lot of our stuff pre-dates ISBN so even identifying them may be a trick. However, I was wondering how / if anyone has dealt with this problem.
-- Ian Bays Director of Projects PTFS Europe.com mobile: +44 (0) 7774995297 phone: +44 (0) 800 756 6803 skype: ian.bays email: ian.bays@ptfs-europe.com
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
-- Elaine Bradtke Data Wrangler VWML English Folk Dance and Song Society | http://www.efdss.org Cecil Sharp House, 2 Regent's Park Road, London NW1 7AY Tel +44 (0) 20 7485 2206 ext 36 Mob +44 (0) 7789 373982 -------------------------------------------------------------------------- Registered Company No. 297142 Charity Registered in England and Wales No. 305999 --------------------------------------------------------------------------- "Writing about music is like dancing about architecture" --Elvis Costello (Musician magazine No. 60 (October 1983), p. 52)