[Koha] Duplicate Biblio Records

Fri Jul 21 13:13:50 NZST 2017

I on occasion have needed to "dedupe" bib records. We have a script that looks for dupes using isbn as the key. We then make the determination on "best" record to keep by see NH which bib record is longer. (Longer implies more tags ergo better cataloging). We then move all items to the best record, along with any outstanding holds. Our script then deletes the "losing bib". It is always a balance between getting rid of dupes (makes Cataloger happy) and retaining good bins (blindly choosing by length alone makes Cataloger sad) 

I am not familiar with script you reference (sync items). I'd have to read it to see if It would work well in deduplication efforts. 

Joy

Sent from my iPhone

> On Jul 20, 2017, at 3:27 PM, Tim Young <Tim.Young at LightSys.org> wrote:
> 
> Every once in a while I help out with a Koha installation.
> 
> At the moment, they have asked me to fix a bunch of duplicate biblio records.  I have found two main ways to do this.
> 
> A  manual way where you are in control every step of the way: http://manual.koha-community.org/3.2/en/stafflists.html#mergebibrecs
> 
> And a potentially destructive way where you  just guess and see what happens: https://saturn.ffzg.hr/koha/index.cgi?action=revision_view;page_name=removing_duplicate_records;revision_id=20091114221320
> 
> 
> As I understand it, the latter basically has you choose one of the biblio records, point all your identical items to the one biblio record, and delete the other biblio records.  Then, one runs a script (sync_items_in_marc_bib.pl) to add any missing data to the biblio record by pulling the data from the items.
> 
> Being a mysql guy and scripting guy, this latter approach seems to be the "easy" way to do it.  If I were a librarian and understood the biblio data, I might be howling in anguish at the thought of randomly selecting the biblio record.  But I have no idea what the information means.  I am a sysadmin and have no real understanding nor ownership of the data.  And, I am not sure the people asking me to do this job completely understand the nuances of this either.
> 
> So, I ask the Koha community for advice.  Should I make a little script that runs the duplicate biblio sql script, selects one of the biblio records to point all the items to, and delete all the other biblio records?  One would need to sync the marc items and reindex when done.  Or is that basically a terrible thing to do?
> 
> And, if I make such a script, is there a place where I should put the script so others do not need to make the same thing?
> 
>    - Tim Young
> 
> 
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha