[Koha] SQL Report - Authorities. List duplicate values based on 010$a

Craig Butosi cbutosi at gmail.com
Wed May 15 02:44:26 NZST 2019


Hi Stephano,

Thanks very much for this. I will test later tonight. For some background,
I recently uploaded the Library Of Congress Subject authority file (2014
release), which is now available in Marc8, UTF-8, and XML formats as an
open access resource: https://www.loc.gov/cds/products/marcDist.php (about
415,000 records). For some reason, about 16,600 duplicates were also
uploaded (minus any Genre/Form terms which can be literal duplicates of
Topical Terms). I needed a way to isolate these duplicate IDs and remove
them.

I actually ran the following report from SQL reports list:
https://wiki.koha-community.org/wiki/SQL_Reports_Library#Duplicate_authorities,
and simply changed 035 to 010. The report generated a column of all IDs
that have been duplicated. I then took these values, went into phpmyadmin
and ran "DELETE FROM auth_header WHERE authid IN ({all duplicate tag 010
IDs separated by commas}), then reindexed Zebra. I then ran the
link_bibs_to_authorities.pl script that comes with Koha to ensure any
unlinked bibs were relinked to authorities.

My authority file is clean now.

I still don't have an explanation as to why or how the dupes entered into
the upload.

I'll bookmark your report though and tinker with it when I get some quiet
time. I appreciate your help!

Craig Butosi, MA, MLIS, B Mus (Hons)
Library: library.craigbutosi.ca


On Tue, May 14, 2019 at 4:55 AM Stefano Bargioni <bargioni at pusc.it> wrote:

> Maybe a self join can help you.
> See https://www.w3schools.com/sql/sql_join_self.asp <
> https://www.w3schools.com/sql/sql_join_self.asp>
> However, your query could be very slow, since the match will not work on
> indexed data. If not sure, but try
>
> SELECT A.authid AS authid1, B.authid AS authid2,
> ExtractValue(A.marcxml,'/record/datafield[@tag="010"]/subfield[@code="a"]')
> AS dup
> FROM auth_header A, auth_header B
> WHERE A.authid <> B.authid
> AND
> ExtractValue(A.marcxml,'/record/datafield[@tag="010"]/subfield[@code="a"]')
> <> ''
> AND
> ExtractValue(A.marcxml,'/record/datafield[@tag="010"]/subfield[@code="a"]')
> =
> ExtractValue(B.marcxml,'/record/datafield[@tag="010"]/subfield[@code="a"]')
> ORDER BY A.authid;
>
> HTH. Stefano
>
> > Subject: [Koha] SQL Report - Authorities. List duplicate values based
> >       on      010$a
> > Message-ID:
> >       <
> CAP4tGF9Em5-4xiMHerf9B6sS+9ogzmh8_iMgKz-zJdtDJeBh6A at mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi all,
> >
> > Koha 18.11 on Ubunti 16.04
> >
> > May I request help for an SQL report that does the following, please:
> >
> > Identify and list by authid and main heading all authority records that
> > have a tag 010$a with duplicate values?
> >
> > I am attempting to located all authority records that can be merged (or
> > de-duped via merge).
> >
> > Many thanks!
> >
> > Craig Butosi, MA, MLIS, B Mus (Hons)
> > Library: library.craigbutosi.ca
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Sun, 12 May 2019 01:49:55 +0530
> > From: vijay kumar <vijcrjbhu at gmail.com>
> > To: koha at lists.katipo.co.nz
> > Subject: [Koha] upgrade from koha-17.05.06 to koha-latest
> > Message-ID:
> >       <
> CACNt1H6pKRZsRJiQQSeoevwDbuMYRwDmxCOiMBksweSfPUbGCg at mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Dear all,
> > We have koha-17.05.06, installed using tarball method.
> >
> > I have followed the following methods for upgrading from koha-17.05.06 to
> > koha-latest
> > 1. Updated koha-list source accordingly koha-stable
> > 2. Update all perl dependencies and cpan module
> >
> > vijay at debian:/usr/share/koha-18.11.04$ perl Makefile.PL
> > /usr/share/koha-17.05.06/misc/koha-install-log
> > /usr/share/koha/misc/koha-install-log
> >
> > and also
> >
> > vijay at debian:/usr/share/koha-18.11.04$ sudo perl Makefile.PL
> > /usr/share/koha-17.05.06/misc/koha-install-log
> > /usr/share/koha-18.11.04/misc/koha-install-log
> >
> > vijay at debian:/usr/share/koha-18.11.04$ sudo make
> >
> > vijay at debian:/usr/share/koha-18.11.04$ sudo make upgrade
> >
> ------------------------------------------------------------------------------------
> >
> > Getting the following error during upgrade
> > Kindly suggest the proper method for up-gradation of koha-latest
> >
> > With regards,
> > Vijay Kumar
> >
> >
> > ::::error::::
> >
> > vijay at debian:/usr/share/koha-18.11.04$ sudo make upgrade
> > Can't locate C4/Installer/UpgradeBackup.pm in @INC (you may need to
> install
> > the C4::Installer::UpgradeBackup module) (@INC contains: /etc/perl
> > /usr/local/lib/x86_64-linux-gnu/perl/5.24.1 /usr/local/share/perl/5.24.1
> > /usr/lib/x86_64-linux-gnu/perl5/5.24 /usr/share/perl5
> > /usr/lib/x86_64-linux-gnu/perl/5.24 /usr/share/perl/5.24
> > /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base).
> > BEGIN failed--compilation aborted.
> > Makefile:22908: recipe for target 'make_upgrade_backup' failed
> > make: *** [make_upgrade_backup] Error 2
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Sun, 12 May 2019 08:27:33 +1200
> > From: David Nind <david.nind at gmail.com>
> > To: Craig Butosi <cbutosi at gmail.com>
> > Cc: koha <koha at lists.katipo.co.nz>
> > Subject: Re: [Koha] SQL Report - Authorities. List duplicate values
> >       based on        010$a
> > Message-ID:
> >       <
> CAK5OMPcJCLGVyOHFBubZ2-Et9-Tf-cTWyk+Ui21eYPXhxE-z6g at mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi Craig.
> >
> > I'm not that great at reports, but you might be able to find something in
> > the SQL reports library that could help:
> > https://wiki.koha-community.org/wiki/SQL_Reports_Library
> >
> >
> > David Nind | david.nind at gmail.com
> > PO Box 12367, Thorndon, Wellington, New Zealand 6144
> > m. +64 21 0537 847
> >
> >
> > On Sun, 12 May 2019 at 05:46, Craig Butosi <cbutosi at gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> Koha 18.11 on Ubunti 16.04
> >>
> >> May I request help for an SQL report that does the following, please:
> >>
> >> Identify and list by authid and main heading all authority records that
> >> have a tag 010$a with duplicate values?
> >>
> >> I am attempting to located all authority records that can be merged (or
> >> de-duped via merge).
> >>
> >> Many thanks!
> >>
> >> Craig Butosi, MA, MLIS, B Mus (Hons)
> >> Library: library.craigbutosi.ca
> >> _______________________________________________
> >> Koha mailing list  http://koha-community.org
> >> Koha at lists.katipo.co.nz
> >> https://lists.katipo.co.nz/mailman/listinfo/koha
> >>
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Koha mailing list
> > Koha at lists.katipo.co.nz
> > https://lists.katipo.co.nz/mailman/listinfo/koha
> >
> >
> > ------------------------------
> >
> > End of Koha Digest, Vol 163, Issue 12
> > *************************************
>
> _______________________________________________
> Koha mailing list  http://koha-community.org
> Koha at lists.katipo.co.nz
> https://lists.katipo.co.nz/mailman/listinfo/koha
>


More information about the Koha mailing list