[Koha] Bare Bones Import and Merge with Card Data
Paul Hoffman
paul at flo.org
Fri Jan 26 08:34:34 NZDT 2018
On Thu, Jan 25, 2018 at 11:09:47AM -0500, Alvin W. wrote:
> I've setup a new Koha on an Ubuntu server and am preparing my initial
> upload of data to populate it. I have a file of bare-bones data
>
> ISBN
> Title
> Author
>
> I've been working on the assumption that by using ISBN I can collect the
> rest of the data from z39.50 servers.
Maybe for most records, but the hard part may be identifying the ones
that you don't find a match for and then dealing with them.
> I have figured out how to run a z39.50 search from inside Koha and I have
> been able to merge that with my items. But I haven't seen any way to
> automate that in some sort of batch run for all the items.
>
> Is it possible to automate the z39.50 search & merge inside Koha?
That would probably be a *lot* of work.
> Or would the better approach be to use a z39.50 client --before-- loading
> the data into Koha and when I build full-content records for the import.
Yes, definitely!
> I've looked at z39.50 client software and almost all of it is either not
> supported, not found, or very, very old -- one link said it was for Windows
> 95!
Index Data's YAZ toolkit is the gold standard for Z39.50 client software:
http://www.indexdata.com/yaz/
I use the Perl module ZOOM that's built on top of YAZ:
https://metacpan.org/pod/distribution/Net-Z3950-ZOOM/lib/ZOOM.pod
> So -- to add card catalog/Marc data to complete my bare bones list -- can
> it be done in Koha or should I build those record before the Import?
>
> If before the import, can anyone recommend a z39.50 client?
If you can install YAZ and ZOOM then I have a Perl script (zsearch,
attached) that you can feed a file of PQN-style (Prefix Query Notation)
queries and get (some) MARC records back. Since each query is a simple
ISBN search, you can do it all from the command line -- something like
this:
### 1. Prepare a file of PQN queries:
$ awk '{print $1}' isbns.txt | tr -dc 0-9Xx | sed 's/^/@attr 1=7 /' > searches.pqn
### 2. Run searches on your favorite Z39.50 server and save the first
### matching MARC bib record for each ISBN:
$ zsearch -h HOST -p PORT -d BASE -m 1 < searches.pqn > found.mrc 2> search.log
### 3. Sanity check -- compare the number of MARC records to the number
### of queries:
$ perl -0x1d -ne 'END { print "$. records\n" }' found.mrc
$ wc -l searches.pqn
Whichever way you end up doing this, good luck!
Paul.
--
Paul Hoffman <paul at flo.org>
Software Manager
Fenway Library Organization
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)
-------------- next part --------------
#!/m1/shared/bin/perl
use strict;
use warnings;
use ZOOM;
use Time::HiRes qw(sleep);
use Getopt::Long
qw(:config posix_default gnu_compat require_order bundling no_ignore_case);
my $host = 'localhost';
my $port = 7091;
my $dbname;
my $preferred_record_syntax = 'usmarc';
my $charset = 'UTF-8';
my $delay = 0.1;
my $randomize;
my ($verbose, $silent);
my $fetch_limit = 1;
my $count_only;
my $die_if_over_limit;
my $die_on_error;
my $die_if_not_found;
my $esn;
my $err;
my %options;
GetOptions(
'h|host=s' => sub { $host = $_[1]; $dbname = $1 if $host =~ s{/(.+)$}{}; $port = $1 if $host =~ s/:(\d+)$//; },
'p|port=s' => \$port,
'A|authenticate=s' => sub { set_authentication($_[1]) },
'd|database=s' => \$dbname,
'x|preferred-record-syntax=s' => \$preferred_record_syntax,
'c|charset=s' => \$charset,
'f|from-file=s' => sub { open STDIN, '<', $_[1] or die "Can't open $_[1]: $!" },
'v|verbose' => \$verbose,
's|silent' => \$silent,
'k|delay=f' => \$delay,
'r|randomize-delay' => \$randomize,
'K|no-delay' => sub { $delay = 0 },
'm|fetch-limit=i' => \$fetch_limit,
'M|hard-fetch-limit=i' => sub { $fetch_limit = $_[1]; $die_if_over_limit = 1 },
'n|count-only' => \$count_only,
'E|die-on-error' => \$die_on_error,
'N|die-if-not-found' => \$die_if_not_found,
'O|worldcat=s' => sub {
($host, $port, $dbname) = ('zcat.oclc.org', 210, 'OLUCWorldCat');
set_authentication('@'.glob('~/.oclcauth/'.$_[1]));
},
'e|element-set-name=s' => \$esn,
) or exit usage();
$dbname = 'voyager' if $host eq 'localhost' && !defined $dbname;
my $get_next_query;
if (@ARGV) {
$get_next_query = sub { shift @ARGV };
}
else {
$get_next_query = sub { my $b = <STDIN>; chomp $b if defined $b; $b };
}
binmode STDOUT;
my $conn = ZOOM::Connection->new(
$host,
$port,
defined($dbname) ? (databaseName => $dbname) : (),
%options,
);
$conn->option('preferredRecordSyntax' => $preferred_record_syntax);
$conn->option('charset' => $charset);
$conn->option('elementSetName' => $esn) if defined $esn;
my $found = 0;
my $not_found = 0;
my $truncated = 0;
my $errors = 0;
my $skipped = 0;
my $exiting;
my $i = 0;
while (defined(my $query = $get_next_query->())) {
if ($query =~ /^\s*(#.*)?$/) {
# Skip blank lines and comments
next;
}
$i++;
if ($exiting) {
$skipped++;
next;
}
my ($rs, $n) = search($conn, $query);
if (!defined $n) {
print STDERR "E $i SEARCH ERROR { $query } ", $err->code, ": ", $err->message, "\n" if !$silent;
$errors++;
$exiting = 1 if $die_on_error;
next;
}
elsif ($count_only) {
print STDERR "+ $n FOUND $i { $query }\n";
$found++;
next;
}
elsif ($n == 0) {
print STDERR "- $i NOT FOUND { $query }\n" if $verbose;
$not_found++;
$exiting = 1 if $die_if_not_found;
next;
}
elsif ($fetch_limit && $n > $fetch_limit) {
if ($die_if_over_limit) {
# Too many matches
print STDERR "E $i LIMIT EXCEEDED { $query }\n" if !$silent;
$errors++;
$exiting = 1 if $die_on_error;
next;
}
else {
print STDERR "W $i SEARCH TRUNCATED { $query }\n" if !$silent;
$truncated++;
}
}
my $count = $fetch_limit || $n;
$count = $n if $n < $count;
my @records = fetch($rs, $count);
if (!@records) {
# Couldn't fetch
print STDERR "E $i FETCH ERROR { $query } ", $err->code, ": ", $err->message, "\n" if $verbose;
$errors++;
$exiting = 1 if $die_on_error;
next;
}
print STDERR "+ $n FOUND $i { $query }\n" if $verbose;
$found++;
foreach my $rec (@records) {
print $rec->raw;
}
if ($delay > 0) {
my $d = $delay;
if ($randomize) {
$d = $d / 2 + rand $d;
}
sleep $d;
}
}
# --- Summarize results
if ($die_on_error && $errors > 0) {
print STDERR "Errors -- exiting\n" if !$silent;
}
if ($die_if_not_found && $not_found > 0) {
print STDERR "Not found -- exiting\n" if !$silent;
}
printf STDERR <<'EOS', $found, $truncated, $not_found, $errors, $skipped, $i if $verbose;
Summary:
%5d found (%d truncated)
%5d not found
%5d errors
%5d skipped
----- ---------
%5d total
EOS
exit 2 if $errors > 0;
sub search {
my ($conn, $query) = @_;
my ($rs, $n);
eval {
$rs = $conn->search_pqf($query);
$n = $rs->size;
};
$err = $@;
return ($rs, $n);
}
sub fetch {
my ($rs, $count) = @_;
my @records;
my $i = 0;
eval {
while ($i < $count) {
push @records, $rs->record($i++);
}
};
$err = $@;
return unless @records == $count;
return @records;
}
sub set_authentication {
my ($str) = @_;
my ($user, $pass);
if ($str =~ s/^\@//) {
open my $fh, '<', $str or die "Can't open authentication file $str: $!";
my @lines;
while (<$fh>) {
next if /^\s*(?:#.*)?$/; # Skip blank lines and comments
$user = $1, next if /^user\s+(.+)/i;
$pass = $1, next if /^pass(?:word)?\s+(.+)/i;
chomp;
push @lines, $_;
}
if (!defined $user) {
die "No user in auth file $str" if defined $pass;
die "Malformed auth file $str" if @lines != 2;
($user, $pass) = @lines;
}
elsif (!defined $pass) {
die "No password for user $user in auth file $str";
}
}
else {
($user, $pass) = split m{/}, $str, 2;
}
$options{'user'} = $user if defined $user;
$options{'password'} = $pass if defined $pass;
}
More information about the Koha
mailing list