Cleaning up lots of duplicates
I've discovered that due to a mistake a long time ago we have a large number of duplicate biblio records in our system... as much as a third of our database, or possibly even more. This is old enough that the import marc batches have long been cleaned. The good news is these duplicates are mainly for e-Books licensed through EBSCO, where we just have the bibliographic record and an 856u field, with no items or circulation. If I could construct an SQL query to identify the duplicate biblionumbers (excluding an original record for each item), would it be enough to delete database records from the items, biblioitems, and biblios tables, and then fully re-index Zebra to clean these up? Joel Coehoorn Director of Information Technology 402.363.5603 *jcoehoorn@york.edu <jcoehoorn@york.edu>* The mission of York College is to transform lives through Christ-centered education and to equip students for lifelong service to God, family, and society
For ebooks more than one record is like having more than one copy of that title which allows more than one person use that title at the same time. For print titles we can add all of the copies to the same bibliographic record. But it takes two bibliographic records for two people to use the same title of an eBook at the same time. It requires a URL to access an eBook and a URL can only be used by one person at a time. I assume that is the reason you are finding duplicate bib records for ebooks. On Aug 3, 2016 9:43 PM, "Coehoorn, Joel" <jcoehoorn@york.edu> wrote:
I've discovered that due to a mistake a long time ago we have a large number of duplicate biblio records in our system... as much as a third of our database, or possibly even more. This is old enough that the import marc batches have long been cleaned.
The good news is these duplicates are mainly for e-Books licensed through EBSCO, where we just have the bibliographic record and an 856u field, with no items or circulation.
If I could construct an SQL query to identify the duplicate biblionumbers (excluding an original record for each item), would it be enough to delete database records from the items, biblioitems, and biblios tables, and then fully re-index Zebra to clean these up?
Joel Coehoorn Director of Information Technology 402.363.5603 *jcoehoorn@york.edu <jcoehoorn@york.edu>*
The mission of York College is to transform lives through Christ-centered education and to equip students for lifelong service to God, family, and society _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
* Carlock, Ruth (rmcarlock@york.edu) wrote:
For ebooks more than one record is like having more than one copy of that title which allows more than one person use that title at the same time. For print titles we can add all of the copies to the same bibliographic record. But it takes two bibliographic records for two people to use the same title of an eBook at the same time. It requires a URL to access an eBook and a URL can only be used by one person at a time. I assume that is the reason you are finding duplicate bib records for ebooks.
Even if you buy into the artificial idea that publishes enforce that bits are scarce and 2 people can't be using the same resource at the same time. (Which as libraries we should be pointing is absurd at all possible opportunities) Then why wouldn't you just put the url at 952u, you can then have multiple items with different urls attached to the same bibliograhic record. Chris
On Aug 3, 2016 9:43 PM, "Coehoorn, Joel" <jcoehoorn@york.edu> wrote:
I've discovered that due to a mistake a long time ago we have a large number of duplicate biblio records in our system... as much as a third of our database, or possibly even more. This is old enough that the import marc batches have long been cleaned.
The good news is these duplicates are mainly for e-Books licensed through EBSCO, where we just have the bibliographic record and an 856u field, with no items or circulation.
If I could construct an SQL query to identify the duplicate biblionumbers (excluding an original record for each item), would it be enough to delete database records from the items, biblioitems, and biblios tables, and then fully re-index Zebra to clean these up?
Joel Coehoorn Director of Information Technology 402.363.5603 *jcoehoorn@york.edu <jcoehoorn@york.edu>*
The mission of York College is to transform lives through Christ-centered education and to equip students for lifelong service to God, family, and society _______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz https://lists.katipo.co.nz/mailman/listinfo/koha
-- Chris Cormack Catalyst IT Ltd. +64 4 803 2238 PO Box 11-053, Manners St, Wellington 6142, New Zealand
If I could construct an SQL query to identify the duplicate biblionumbers (excluding an original record for each item), would it be enough to delete database records from the items, biblioitems, and biblios tables, and then fully re-index Zebra to clean these up?
Yes, it would be enough, if you don't have subscriptions or holdings linked to the deleted biblio records. Alternatively, you can build a query which produces a list of biblionumber, then use this list in Tools > Batch record deletion (no reindexing required). Kind regards, -- Frédéric DEMIANS http://www.tamil.fr/fdemians
participants (4)
-
Carlock, Ruth -
Chris Cormack -
Coehoorn, Joel -
Frédéric Demians