[Koha] Help needed with zombie background_jobs processes

Thu Apr 20 04:49:12 NZST 2023

That's an interesting thought.  I *am* missing the <background_jobs_worker>
section in my config file, but am I interpreting the patch correctly that
it defaults to "1" if it's not present?

-----------------------------------------------------------
Cindy Murdock Ames
IT Services Director
Meadville Public Library | CCFLS
https://meadvillelibrary.org | https://ccfls.org

Please report tech support issues in Mantis:  https://mantis.ccfls.org

On Wed, Apr 19, 2023 at 12:24 PM Jonathan Druart <
jonathan.druart at bugs.koha-community.org> wrote:

> It would be interesting to revert the changes from 32558 that have
> been backported into 22.11.04 and see if it helps.
>
> Le mer. 19 avr. 2023 à 18:01, Cindy Murdock Ames <cmurdock at ccfls.org> a
> écrit :
> >
> > Hi Jonathan,
> >
> > I just tried sending SHGCHLD to the parent processes, it didn't have any
> effect.  The parents are "/usr/bin/perl /usr/share/koha/bin/
> background_jobs_worker.pl --queue default" and "/usr/bin/perl
> /usr/share/koha/bin/background_jobs_worker.pl --queue long_tasks".
> >
> > worker-error.log has a few entries like these three from today:
> > 20230419 08:44:53 ccfls-koha-worker-long_tasks: client (pid 12169)
> killed by signal 13, respawning
> > 20230419 09:36:06 ccfls-koha-worker: client (pid 14398) killed by signal
> 13, respawning
> > 20230419 09:59:35 ccfls-koha-worker: client (pid 29935) killed by signal
> 13, respawning
> >
> > Those timestamps correspond to three jobs in the jobs queue that didn't
> complete and have a "null/n" (n being numbers that I think correspond to
> the number of things in the batch).  The first is a batch item record
> modification and the other two are holds queue updates.
> >
> > I cancelled these three jobs and the zombies remained.
> >
> > worker-output.log has a number of entries like these, but unfortunately
> there are no timestamps so I can't link it to anything, although the
> timestamp on the file itself is from yesterday at 13:08, which I think
> corresponded to a successful staging and import of records.
> >
> > Use of uninitialized value $subfield_value in pattern match (m//) at
> /usr/share/koha/lib/Koha/SimpleMARC.pm line 435.
> > Use of uninitialized value $subfield_value in string eq at
> /usr/share/koha/lib/Koha/SimpleMARC.pm line 435.
> >
> > I did try something else.  The parent process for the long queue had
> apparently already respawned, but the one for the default one hadn't, so I
> killed it with -9.  The two zombies that had been there went away and the
> default queue restarted.  Before I did that I tried a MARC upload, it was
> stuck at 0%.  I cancelled the job and retried it after killing the default
> queue and it worked, but it spawned a new zombie which was a child of the
> long_tasks queue.  Yesterday it seemed to work if there was only one
> zombie, but not two.  No new entries in either of the worker- files.
> >
> > Thanks for your help.
> >
> > c.
> > -----------------------------------------------------------
> > Cindy Murdock Ames
> > IT Services Director
> > Meadville Public Library | CCFLS
> > https://meadvillelibrary.org | https://ccfls.org
> >
> > Please report tech support issues in Mantis:  https://mantis.ccfls.org
> >
> >
> > On Wed, Apr 19, 2023 at 2:44 AM Jonathan Druart <
> jonathan.druart at bugs.koha-community.org> wrote:
> >>
> >> Did you have a look at worker-*.log? Nothing useful there?
> >>
> >> You can try to send SIGCHLD to the parent to kill the zombie.
> >>
> >> Le mar. 18 avr. 2023 à 22:09, Cindy Murdock Ames <cmurdock at ccfls.org>
> a écrit :
> >> >
> >> > A few other things I've noticed:
> >> >
> >> > - Sometimes the zombie processes will go away on their own, sometimes
> it seems when you retry the MARC import or whatever it was that failed.
> This one is really weird to me as in all my years as a sysadmin I thought
> it was not possible for zombie processes to go away without a reboot.  But
> maybe that's changed and now zombies can rise from the dead.  Lol.
> >> >
> >> > - In looking at the jobs list in Koha, it seems that Holds queue
> updates are especially prone to getting stuck at a progress of null/1.
> >> >
> >> > - If you reattempt a job that is stuck (ie, reattempting a MARC file
> upload or what not) it will often succeed.  The original failed job remains
> with a progress of null.
> >> >
> >> > c.
> >> > -----------------------------------------------------------
> >> > Cindy Murdock Ames
> >> > IT Services Director
> >> > Meadville Public Library | CCFLS
> >> > https://meadvillelibrary.org | https://ccfls.org
> >> >
> >> > Please report tech support issues in Mantis:
> https://mantis.ccfls.org
> >> >
> >> >
> >> > On Tue, Apr 18, 2023 at 3:55 PM Cindy Murdock Ames <
> cmurdock at ccfls.org> wrote:
> >> >>
> >> >> Yes, it's 22.11.04, package version.
> >> >>
> >> >> -----------------------------------------------------------
> >> >> Cindy Murdock Ames
> >> >> IT Services Director
> >> >> Meadville Public Library | CCFLS
> >> >> https://meadvillelibrary.org | https://ccfls.org
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Apr 18, 2023 at 2:59 PM Jonathan Druart <
> jonathan.druart at bugs.koha-community.org> wrote:
> >> >>>
> >> >>> Hi Cindy,
> >> >>> Which exact version of Koha 22.11.xx? It should be the latest one.
> >> >>> Regards,
> >> >>> Jonathan
> >> >>>
> >> >>> Le mar. 18 avr. 2023 à 19:13, Cindy Murdock Ames <
> cmurdock at ccfls.org> a écrit :
> >> >>> >
> >> >>> > Hi all,
> >> >>> >
> >> >>> > A couple weekends ago I upgraded our Koha instance from 22.05 to
> 22.11, and
> >> >>> > I'm having trouble with the background_jobs processes becoming
> zombies
> >> >>> > after a very short amount of time, necessitating a reboot.  I
> suspect it's
> >> >>> > a misconfiguration on my part, so if someone can shed some light
> I'd really
> >> >>> > appreciate it!
> >> >>> >
> >> >>> > The first symptom was our MARC imports getting stuck at "import
> queued",
> >> >>> > and after some digging (and thanks to the thread in this list
> with the
> >> >>> > subject of "Background job / Staging MARC import stuck at 0%" I
> found I was
> >> >>> > entirely missing the <message_broker> section in our config, so I
> added
> >> >>> > this:
> >> >>> >
> >> >>> >  <message_broker>
> >> >>> >    <hostname>localhost</hostname>
> >> >>> >    <port>61613</port>
> >> >>> >    <username>guest</username>
> >> >>> >    <password>guest</password>
> >> >>> >    <vhost></vhost>
> >> >>> >  </message_broker>
> >> >>> >
> >> >>> > Which seemed to resolve it, but now I find that the
> background_jobs
> >> >>> > processes are going zombie after processing only a few jobs.
> Here's some
> >> >>> > info from the rabbitmq log after restarting the server:
> >> >>> >
> >> >>> > =INFO REPORT==== 18-Apr-2023::12:23:46 ===
> >> >>> > node           : rabbit at ccflskoha
> >> >>> > home dir       : /var/lib/rabbitmq
> >> >>> > config file(s) : /etc/rabbitmq/rabbitmq.config (not found)
> >> >>> > cookie hash    : ojvkUE6eUtku7kHlx3uiFg==
> >> >>> > log            : /var/log/rabbitmq/rabbit at ccflskoha.log
> >> >>> > sasl log       : /var/log/rabbitmq/rabbit at ccflskoha-sasl.log
> >> >>> > database dir   : /var/lib/rabbitmq/mnesia/rabbit at ccflskoha
> >> >>> >
> >> >>> > Is it problematic that /etc/rabbitmq/rabbitmq.config is missing?
> Anything
> >> >>> > else I should be looking at?  We're running on Ubuntu SE 18.04 if
> that is
> >> >>> > helpful.
> >> >>> >
> >> >>> > Thanks much!
> >> >>> > Cindy
> >> >>> >
> >> >>> >
> >> >>> > -----------------------------------------------------------
> >> >>> > Cindy Murdock Ames
> >> >>> > IT Services Director
> >> >>> > Meadville Public Library | CCFLS
> >> >>> > https://meadvillelibrary.org | https://ccfls.org
> >> >>> > _______________________________________________
> >> >>> >
> >> >>> > Koha mailing list  http://koha-community.org
> >> >>> > Koha at lists.katipo.co.nz
> >> >>> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
>