[Koha] Bare Bones Import and Merge with Card Data

Fri Jan 26 08:34:34 NZDT 2018

On Thu, Jan 25, 2018 at 11:09:47AM -0500, Alvin W. wrote:
> I've setup a new Koha on an Ubuntu server and am preparing my initial
> upload of data to populate it. I have a file of bare-bones data
> 
> ISBN
> Title
> Author
> 
> I've been working on the assumption that by using ISBN I can collect the
> rest of the data from z39.50 servers.

Maybe for most records, but the hard part may be identifying the ones 
that you don't find a match for and then dealing with them.

> I have figured out how to run a z39.50 search from inside Koha and I have
> been able to merge that with my items. But I haven't seen any way to
> automate that in some sort of batch run for all the items.
> 
> Is it possible to automate the z39.50 search & merge inside Koha?

That would probably be a *lot* of work.

> Or would the better approach be to use a z39.50 client --before-- loading
> the data into Koha and when I build full-content records for the import.

Yes, definitely!

> I've looked at z39.50 client software and almost all of it is either not
> supported, not found, or very, very old -- one link said it was for Windows
> 95!

Index Data's YAZ toolkit is the gold standard for Z39.50 client software:

http://www.indexdata.com/yaz/

I use the Perl module ZOOM that's built on top of YAZ:

https://metacpan.org/pod/distribution/Net-Z3950-ZOOM/lib/ZOOM.pod

> So -- to add card catalog/Marc data to complete my bare bones list -- can
> it be done in Koha or should I build those record before the Import?
> 
> If before the import, can anyone recommend a z39.50 client?

If you can install YAZ and ZOOM then I have a Perl script (zsearch, 
attached) that you can feed a file of PQN-style (Prefix Query Notation) 
queries and get (some) MARC records back.  Since each query is a simple 
ISBN search, you can do it all from the command line -- something like 
this:

### 1. Prepare a file of PQN queries:
$ awk '{print $1}' isbns.txt | tr -dc 0-9Xx | sed 's/^/@attr 1=7 /' > searches.pqn

### 2. Run searches on your favorite Z39.50 server and save the first 
###    matching MARC bib record for each ISBN:
$ zsearch -h HOST -p PORT -d BASE -m 1 < searches.pqn > found.mrc 2> search.log

### 3. Sanity check -- compare the number of MARC records to the number 
###    of queries:
$ perl -0x1d -ne 'END { print "$. records\n" }' found.mrc
$ wc -l searches.pqn

Whichever way you end up doing this, good luck!

Paul.

-- 
Paul Hoffman <paul at flo.org>
Software Manager
Fenway Library Organization
550 Huntington Ave.
Boston, MA 02115
(617) 442-2384 (FLO main number)
-------------- next part --------------
#!/m1/shared/bin/perl

use strict;
use warnings;

use ZOOM;
use Time::HiRes qw(sleep);
use Getopt::Long
    qw(:config posix_default gnu_compat require_order bundling no_ignore_case);

my $host = 'localhost';
my $port = 7091;
my $dbname;
my $preferred_record_syntax = 'usmarc';
my $charset = 'UTF-8';
my $delay = 0.1;
my $randomize;
my ($verbose, $silent);
my $fetch_limit = 1;
my $count_only;
my $die_if_over_limit;
my $die_on_error;
my $die_if_not_found;
my $esn;
my $err;
my %options;

GetOptions(
    'h|host=s' => sub { $host = $_[1]; $dbname = $1 if $host =~ s{/(.+)$}{}; $port = $1 if $host =~ s/:(\d+)$//; },
    'p|port=s' => \$port,
    'A|authenticate=s' => sub { set_authentication($_[1]) },
    'd|database=s' => \$dbname,
    'x|preferred-record-syntax=s' => \$preferred_record_syntax,
    'c|charset=s' => \$charset,
    'f|from-file=s' => sub { open STDIN, '<', $_[1] or die "Can't open $_[1]: $!" },
    'v|verbose' => \$verbose,
    's|silent' => \$silent,
    'k|delay=f' => \$delay,
    'r|randomize-delay' => \$randomize,
    'K|no-delay' => sub { $delay = 0 },
    'm|fetch-limit=i' => \$fetch_limit,
    'M|hard-fetch-limit=i' => sub { $fetch_limit = $_[1]; $die_if_over_limit = 1 },
    'n|count-only' => \$count_only,
    'E|die-on-error' => \$die_on_error,
    'N|die-if-not-found' => \$die_if_not_found,
    'O|worldcat=s' => sub {
        ($host, $port, $dbname) = ('zcat.oclc.org', 210, 'OLUCWorldCat');
        set_authentication('@'.glob('~/.oclcauth/'.$_[1]));
    },
    'e|element-set-name=s' => \$esn,
) or exit usage();

$dbname = 'voyager' if $host eq 'localhost' && !defined $dbname;

my $get_next_query;
if (@ARGV) {
    $get_next_query = sub { shift @ARGV };
}
else {
    $get_next_query = sub { my $b = <STDIN>; chomp $b if defined $b; $b };
}

binmode STDOUT;

my $conn = ZOOM::Connection->new(
    $host,
    $port,
    defined($dbname) ? (databaseName => $dbname) : (),
    %options,
);

$conn->option('preferredRecordSyntax' => $preferred_record_syntax);
$conn->option('charset' => $charset);
$conn->option('elementSetName' => $esn) if defined $esn;

my $found = 0;
my $not_found = 0;
my $truncated = 0;
my $errors = 0;
my $skipped = 0;

my $exiting;
my $i = 0;

while (defined(my $query = $get_next_query->())) {
    if ($query =~ /^\s*(#.*)?$/) {
        # Skip blank lines and comments
        next;
    }
    $i++;
    if ($exiting) {
        $skipped++;
        next;
    }
    my ($rs, $n) = search($conn, $query);
    if (!defined $n) {
        print STDERR "E $i SEARCH ERROR { $query } ", $err->code, ": ", $err->message, "\n" if !$silent;
        $errors++;
        $exiting = 1 if $die_on_error;
        next;
    }
    elsif ($count_only) {
        print STDERR "+ $n FOUND $i { $query }\n";
        $found++;
        next;
    }
    elsif ($n == 0) {
        print STDERR "- $i NOT FOUND { $query }\n" if $verbose;
        $not_found++;
        $exiting = 1 if $die_if_not_found;
        next;
    }
    elsif ($fetch_limit && $n > $fetch_limit) {
        if ($die_if_over_limit) {
            # Too many matches
            print STDERR "E $i LIMIT EXCEEDED { $query }\n" if !$silent;
            $errors++;
            $exiting = 1 if $die_on_error;
            next;
        }
        else {
            print STDERR "W $i SEARCH TRUNCATED { $query }\n" if !$silent;
            $truncated++;
        }
    }
    my $count = $fetch_limit || $n;
    $count = $n if $n < $count;
    my @records = fetch($rs, $count);
    if (!@records) {
        # Couldn't fetch
        print STDERR "E $i FETCH ERROR { $query } ", $err->code, ": ", $err->message, "\n" if $verbose;
        $errors++;
        $exiting = 1 if $die_on_error;
        next;
    }
    print STDERR "+ $n FOUND $i { $query }\n" if $verbose;
    $found++;
    foreach my $rec (@records) {
        print $rec->raw;
    }
    if ($delay > 0) {
        my $d = $delay;
        if ($randomize) {
            $d = $d / 2 + rand $d;
        }
        sleep $d;
    }
}

# --- Summarize results

if ($die_on_error && $errors > 0) {
    print STDERR "Errors -- exiting\n" if !$silent;
}

if ($die_if_not_found && $not_found > 0) {
    print STDERR "Not found -- exiting\n" if !$silent;
}

printf STDERR <<'EOS', $found, $truncated, $not_found, $errors, $skipped, $i if $verbose;
Summary:
%5d found (%d truncated)
%5d not found
%5d errors
%5d skipped
----- ---------
%5d total
EOS

exit 2 if $errors > 0;

sub search {
    my ($conn, $query) = @_;
    my ($rs, $n);
    eval {
        $rs = $conn->search_pqf($query);
        $n = $rs->size;
    };
    $err = $@;
    return ($rs, $n);
}

sub fetch {
    my ($rs, $count) = @_;
    my @records;
    my $i = 0;
    eval {
        while ($i < $count) {
            push @records, $rs->record($i++);
        }
    };
    $err = $@;
    return unless @records == $count;
    return @records;
}

sub set_authentication {
    my ($str) = @_;
    my ($user, $pass);
    if ($str =~ s/^\@//) {
        open my $fh, '<', $str or die "Can't open authentication file $str: $!";
        my @lines;
        while (<$fh>) {
            next if /^\s*(?:#.*)?$/;  # Skip blank lines and comments
            $user = $1, next if /^user\s+(.+)/i;
            $pass = $1, next if /^pass(?:word)?\s+(.+)/i;
            chomp;
            push @lines, $_;
        }
        if (!defined $user) {
            die "No user in auth file $str" if defined $pass;
            die "Malformed auth file $str" if @lines != 2;
            ($user, $pass) = @lines;
        }
        elsif (!defined $pass) {
            die "No password for user $user in auth file $str";
        }
    }
    else {
        ($user, $pass) = split m{/}, $str, 2;
    }
    $options{'user'} = $user if defined $user;
    $options{'password'} = $pass if defined $pass;
}