[Koha] Help needed with zombie background_jobs processes

Jonathan Druart jonathan.druart at bugs.koha-community.org
Thu Apr 20 04:24:02 NZST 2023


It would be interesting to revert the changes from 32558 that have
been backported into 22.11.04 and see if it helps.

Le mer. 19 avr. 2023 à 18:01, Cindy Murdock Ames <cmurdock at ccfls.org> a écrit :
>
> Hi Jonathan,
>
> I just tried sending SHGCHLD to the parent processes, it didn't have any effect.  The parents are "/usr/bin/perl /usr/share/koha/bin/background_jobs_worker.pl --queue default" and "/usr/bin/perl /usr/share/koha/bin/background_jobs_worker.pl --queue long_tasks".
>
> worker-error.log has a few entries like these three from today:
> 20230419 08:44:53 ccfls-koha-worker-long_tasks: client (pid 12169) killed by signal 13, respawning
> 20230419 09:36:06 ccfls-koha-worker: client (pid 14398) killed by signal 13, respawning
> 20230419 09:59:35 ccfls-koha-worker: client (pid 29935) killed by signal 13, respawning
>
> Those timestamps correspond to three jobs in the jobs queue that didn't complete and have a "null/n" (n being numbers that I think correspond to the number of things in the batch).  The first is a batch item record modification and the other two are holds queue updates.
>
> I cancelled these three jobs and the zombies remained.
>
> worker-output.log has a number of entries like these, but unfortunately there are no timestamps so I can't link it to anything, although the timestamp on the file itself is from yesterday at 13:08, which I think corresponded to a successful staging and import of records.
>
> Use of uninitialized value $subfield_value in pattern match (m//) at /usr/share/koha/lib/Koha/SimpleMARC.pm line 435.
> Use of uninitialized value $subfield_value in string eq at /usr/share/koha/lib/Koha/SimpleMARC.pm line 435.
>
> I did try something else.  The parent process for the long queue had apparently already respawned, but the one for the default one hadn't, so I killed it with -9.  The two zombies that had been there went away and the default queue restarted.  Before I did that I tried a MARC upload, it was stuck at 0%.  I cancelled the job and retried it after killing the default queue and it worked, but it spawned a new zombie which was a child of the long_tasks queue.  Yesterday it seemed to work if there was only one zombie, but not two.  No new entries in either of the worker- files.
>
> Thanks for your help.
>
> c.
> -----------------------------------------------------------
> Cindy Murdock Ames
> IT Services Director
> Meadville Public Library | CCFLS
> https://meadvillelibrary.org | https://ccfls.org
>
> Please report tech support issues in Mantis:  https://mantis.ccfls.org
>
>
> On Wed, Apr 19, 2023 at 2:44 AM Jonathan Druart <jonathan.druart at bugs.koha-community.org> wrote:
>>
>> Did you have a look at worker-*.log? Nothing useful there?
>>
>> You can try to send SIGCHLD to the parent to kill the zombie.
>>
>> Le mar. 18 avr. 2023 à 22:09, Cindy Murdock Ames <cmurdock at ccfls.org> a écrit :
>> >
>> > A few other things I've noticed:
>> >
>> > - Sometimes the zombie processes will go away on their own, sometimes it seems when you retry the MARC import or whatever it was that failed.  This one is really weird to me as in all my years as a sysadmin I thought it was not possible for zombie processes to go away without a reboot.  But maybe that's changed and now zombies can rise from the dead.  Lol.
>> >
>> > - In looking at the jobs list in Koha, it seems that Holds queue updates are especially prone to getting stuck at a progress of null/1.
>> >
>> > - If you reattempt a job that is stuck (ie, reattempting a MARC file upload or what not) it will often succeed.  The original failed job remains with a progress of null.
>> >
>> > c.
>> > -----------------------------------------------------------
>> > Cindy Murdock Ames
>> > IT Services Director
>> > Meadville Public Library | CCFLS
>> > https://meadvillelibrary.org | https://ccfls.org
>> >
>> > Please report tech support issues in Mantis:  https://mantis.ccfls.org
>> >
>> >
>> > On Tue, Apr 18, 2023 at 3:55 PM Cindy Murdock Ames <cmurdock at ccfls.org> wrote:
>> >>
>> >> Yes, it's 22.11.04, package version.
>> >>
>> >> -----------------------------------------------------------
>> >> Cindy Murdock Ames
>> >> IT Services Director
>> >> Meadville Public Library | CCFLS
>> >> https://meadvillelibrary.org | https://ccfls.org
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Apr 18, 2023 at 2:59 PM Jonathan Druart <jonathan.druart at bugs.koha-community.org> wrote:
>> >>>
>> >>> Hi Cindy,
>> >>> Which exact version of Koha 22.11.xx? It should be the latest one.
>> >>> Regards,
>> >>> Jonathan
>> >>>
>> >>> Le mar. 18 avr. 2023 à 19:13, Cindy Murdock Ames <cmurdock at ccfls.org> a écrit :
>> >>> >
>> >>> > Hi all,
>> >>> >
>> >>> > A couple weekends ago I upgraded our Koha instance from 22.05 to 22.11, and
>> >>> > I'm having trouble with the background_jobs processes becoming zombies
>> >>> > after a very short amount of time, necessitating a reboot.  I suspect it's
>> >>> > a misconfiguration on my part, so if someone can shed some light I'd really
>> >>> > appreciate it!
>> >>> >
>> >>> > The first symptom was our MARC imports getting stuck at "import queued",
>> >>> > and after some digging (and thanks to the thread in this list with the
>> >>> > subject of "Background job / Staging MARC import stuck at 0%" I found I was
>> >>> > entirely missing the <message_broker> section in our config, so I added
>> >>> > this:
>> >>> >
>> >>> >  <message_broker>
>> >>> >    <hostname>localhost</hostname>
>> >>> >    <port>61613</port>
>> >>> >    <username>guest</username>
>> >>> >    <password>guest</password>
>> >>> >    <vhost></vhost>
>> >>> >  </message_broker>
>> >>> >
>> >>> > Which seemed to resolve it, but now I find that the background_jobs
>> >>> > processes are going zombie after processing only a few jobs.  Here's some
>> >>> > info from the rabbitmq log after restarting the server:
>> >>> >
>> >>> > =INFO REPORT==== 18-Apr-2023::12:23:46 ===
>> >>> > node           : rabbit at ccflskoha
>> >>> > home dir       : /var/lib/rabbitmq
>> >>> > config file(s) : /etc/rabbitmq/rabbitmq.config (not found)
>> >>> > cookie hash    : ojvkUE6eUtku7kHlx3uiFg==
>> >>> > log            : /var/log/rabbitmq/rabbit at ccflskoha.log
>> >>> > sasl log       : /var/log/rabbitmq/rabbit at ccflskoha-sasl.log
>> >>> > database dir   : /var/lib/rabbitmq/mnesia/rabbit at ccflskoha
>> >>> >
>> >>> > Is it problematic that /etc/rabbitmq/rabbitmq.config is missing?  Anything
>> >>> > else I should be looking at?  We're running on Ubuntu SE 18.04 if that is
>> >>> > helpful.
>> >>> >
>> >>> > Thanks much!
>> >>> > Cindy
>> >>> >
>> >>> >
>> >>> > -----------------------------------------------------------
>> >>> > Cindy Murdock Ames
>> >>> > IT Services Director
>> >>> > Meadville Public Library | CCFLS
>> >>> > https://meadvillelibrary.org | https://ccfls.org
>> >>> > _______________________________________________
>> >>> >
>> >>> > Koha mailing list  http://koha-community.org
>> >>> > Koha at lists.katipo.co.nz
>> >>> > Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha


More information about the Koha mailing list