Elasticsearch reindex failing for authorities
Hi all, We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities. $ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities And then it just hangs there for ages until the server runs out of memory and dies. We're running ES 6.x on Koha 20.05.x. I tried running the script directly and specifying an authid and that worked as expected. $ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before? Thanks! -- *Aleisha Amohia*(she/her)
Hi Aleisha Have you tried to lower the batch commit? see https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h... Regards, Alvaro |----------------------------------------------------------------------------------------| Stay safe / Cuídate/ Reste sécurisé *7* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire Le mer. 28 avr. 2021 à 23:52, Aleisha Amohia <aleisha@catalyst.net.nz> a écrit :
Hi all,
We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities.
$ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities
And then it just hangs there for ages until the server runs out of memory and dies.
We're running ES 6.x on Koha 20.05.x.
I tried running the script directly and specifying an authid and that worked as expected.
$ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed
It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before?
Thanks!
-- *Aleisha Amohia*(she/her) _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Hi Alvaro Thank you for responding! I did try lowering the batch commit but it doesn't appear to even get to the part where it starts processing records. I think it gets stuck on fetching the records. Aleisha On 30/04/21 3:39 am, Alvaro Cornejo wrote:
Hi Aleisha
Have you tried to lower the batch commit?
see https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h... <https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.html>
Regards,
Alvaro
|----------------------------------------------------------------------------------------| Stay safe / Cuídate/ Reste sécurisé */7/* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire
Le mer. 28 avr. 2021 à 23:52, Aleisha Amohia <aleisha@catalyst.net.nz <mailto:aleisha@catalyst.net.nz>> a écrit :
Hi all,
We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities.
$ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities
And then it just hangs there for ages until the server runs out of memory and dies.
We're running ES 6.x on Koha 20.05.x.
I tried running the script directly and specifying an authid and that worked as expected.
$ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl <http://rebuild_elasticsearch.pl> -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed
It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before?
Thanks!
-- *Aleisha Amohia*(she/her) _______________________________________________
Koha mailing list http://koha-community.org <http://koha-community.org> Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha <https://lists.katipo.co.nz/mailman/listinfo/koha>
-- *Aleisha Amohia*(she/her) Koha Developer Catalyst IT - Expert Open Source Solutions Mob: +64 21 024 04004 | Tel: +64 4 499 2267 | www.catalyst.net.nz <http://www.catalyst.net.nz> Catalyst Logo CONFIDENTIALITY NOTICE: This email is intended for the named recipients only. It may contain privileged, confidential or copyright information. If you are not the named recipient, any use, reliance upon, disclosure or copying of this email or its attachments is unauthorised. If you have received this email in error, please reply via email or call +64 4 499 2267.
Hi, I think we're just doing it wrong. The problem is that with authorities there's a single SQL query that fetches a record set that includes the marcxml records. Do that for 600 000 records and it use quite a bit of memory. For biblios the record set only contains biblionumbers, and the actual metadata is fetched one-by-one. That's better but still not great. What we should be doing is fetch n records (e.g. with n=1000) at a time, ordered by id. Then each round fetch the next set starting from id > [last fetched id]. This is basicly what I did in bug 27584 to improve the OAI-PMH provider performance. I've created bug 28268 about this. I'll post a patch that you could try soon. Best, Ere Aleisha Amohia kirjoitti 30.4.2021 klo 0.13:
Hi Alvaro
Thank you for responding! I did try lowering the batch commit but it doesn't appear to even get to the part where it starts processing records. I think it gets stuck on fetching the records.
Aleisha
On 30/04/21 3:39 am, Alvaro Cornejo wrote:
Hi Aleisha
Have you tried to lower the batch commit?
see https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h... <https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.html>
Regards,
Alvaro
|----------------------------------------------------------------------------------------| Stay safe / Cuídate/ Reste sécurisé */7/* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire
Le mer. 28 avr. 2021 à 23:52, Aleisha Amohia <aleisha@catalyst.net.nz <mailto:aleisha@catalyst.net.nz>> a écrit :
Hi all,
We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities.
$ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities
And then it just hangs there for ages until the server runs out of memory and dies.
We're running ES 6.x on Koha 20.05.x.
I tried running the script directly and specifying an authid and that worked as expected.
$ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl <http://rebuild_elasticsearch.pl> -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed
It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before?
Thanks!
-- *Aleisha Amohia*(she/her) _______________________________________________
Koha mailing list http://koha-community.org <http://koha-community.org> Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha <https://lists.katipo.co.nz/mailman/listinfo/koha>
-- Ere Maijala Kansalliskirjasto / The National Library of Finland
Hi Other options for manually run elastic reindex that might be implemented would be: - authid will accept a range of authid's instead of just one authid.ai =1234-999 - include a new variable for the number of authid records to reindex and use authid as the start from range Regards, Alvaro |----------------------------------------------------------------------------------------| Stay safe / Cuídate/ Reste sécurisé *7* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire Le ven. 30 avr. 2021 à 02:03, Ere Maijala <ere.maijala@helsinki.fi> a écrit :
Hi,
I think we're just doing it wrong. The problem is that with authorities there's a single SQL query that fetches a record set that includes the marcxml records. Do that for 600 000 records and it use quite a bit of memory. For biblios the record set only contains biblionumbers, and the actual metadata is fetched one-by-one. That's better but still not great.
What we should be doing is fetch n records (e.g. with n=1000) at a time, ordered by id. Then each round fetch the next set starting from id > [last fetched id]. This is basicly what I did in bug 27584 to improve the OAI-PMH provider performance.
I've created bug 28268 about this. I'll post a patch that you could try soon.
Best, Ere
Hi Alvaro
Thank you for responding! I did try lowering the batch commit but it doesn't appear to even get to the part where it starts processing records. I think it gets stuck on fetching the records.
Aleisha
On 30/04/21 3:39 am, Alvaro Cornejo wrote:
Hi Aleisha
Have you tried to lower the batch commit?
see https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h... < https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h...
Regards,
Alvaro
|----------------------------------------------------------------------------------------|
Stay safe / Cuídate/ Reste sécurisé */7/* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire
Le mer. 28 avr. 2021 à 23:52, Aleisha Amohia <aleisha@catalyst.net.nz <mailto:aleisha@catalyst.net.nz>> a écrit :
Hi all,
We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities.
$ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities
And then it just hangs there for ages until the server runs out of memory and dies.
We're running ES 6.x on Koha 20.05.x.
I tried running the script directly and specifying an authid and
Aleisha Amohia kirjoitti 30.4.2021 klo 0.13: that
worked as expected.
$ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl <http://rebuild_elasticsearch.pl> -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed
It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before?
Thanks!
-- *Aleisha Amohia*(she/her) _______________________________________________
Koha mailing list http://koha-community.org <http://koha-community.org> Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha <https://lists.katipo.co.nz/mailman/listinfo/koha>
-- Ere Maijala Kansalliskirjasto / The National Library of Finland _______________________________________________
Koha mailing list http://koha-community.org Koha@lists.katipo.co.nz Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha
Thank you for turning this around so quickly Ere, I backported the fix to our 20.05.x site and it's working well! Aleisha On 30/04/21 7:02 pm, Ere Maijala wrote:
Hi,
I think we're just doing it wrong. The problem is that with authorities there's a single SQL query that fetches a record set that includes the marcxml records. Do that for 600 000 records and it use quite a bit of memory. For biblios the record set only contains biblionumbers, and the actual metadata is fetched one-by-one. That's better but still not great.
What we should be doing is fetch n records (e.g. with n=1000) at a time, ordered by id. Then each round fetch the next set starting from id > [last fetched id]. This is basicly what I did in bug 27584 to improve the OAI-PMH provider performance.
I've created bug 28268 about this. I'll post a patch that you could try soon.
Best, Ere
Aleisha Amohia kirjoitti 30.4.2021 klo 0.13:
Hi Alvaro
Thank you for responding! I did try lowering the batch commit but it doesn't appear to even get to the part where it starts processing records. I think it gets stuck on fetching the records.
Aleisha
On 30/04/21 3:39 am, Alvaro Cornejo wrote:
Hi Aleisha
Have you tried to lower the batch commit?
see https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h...
<https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.html>
Regards,
Alvaro
|----------------------------------------------------------------------------------------|
Stay safe / Cuídate/ Reste sécurisé */7/* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire
Le mer. 28 avr. 2021 à 23:52, Aleisha Amohia <aleisha@catalyst.net.nz <mailto:aleisha@catalyst.net.nz>> a écrit :
Hi all,
We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities.
$ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities
And then it just hangs there for ages until the server runs out of memory and dies.
We're running ES 6.x on Koha 20.05.x.
I tried running the script directly and specifying an authid and that worked as expected.
$ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl <http://rebuild_elasticsearch.pl> -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed
It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before?
Thanks!
-- *Aleisha Amohia*(she/her) _______________________________________________
Koha mailing list http://koha-community.org <http://koha-community.org> Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha <https://lists.katipo.co.nz/mailman/listinfo/koha>
-- *Aleisha Amohia*(she/her) Koha Developer Catalyst IT - Expert Open Source Solutions Mob: +64 21 024 04004 | Tel: +64 4 499 2267 | www.catalyst.net.nz <http://www.catalyst.net.nz> Catalyst Logo CONFIDENTIALITY NOTICE: This email is intended for the named recipients only. It may contain privileged, confidential or copyright information. If you are not the named recipient, any use, reliance upon, disclosure or copying of this email or its attachments is unauthorised. If you have received this email in error, please reply via email or call +64 4 499 2267.
Awesome! :) I'd appreciate it if you'd be able to sign off the patch in Bugzilla. --Ere Aleisha Amohia kirjoitti 3.5.2021 klo 0.10:
Thank you for turning this around so quickly Ere, I backported the fix to our 20.05.x site and it's working well!
Aleisha
On 30/04/21 7:02 pm, Ere Maijala wrote:
Hi,
I think we're just doing it wrong. The problem is that with authorities there's a single SQL query that fetches a record set that includes the marcxml records. Do that for 600 000 records and it use quite a bit of memory. For biblios the record set only contains biblionumbers, and the actual metadata is fetched one-by-one. That's better but still not great.
What we should be doing is fetch n records (e.g. with n=1000) at a time, ordered by id. Then each round fetch the next set starting from id > [last fetched id]. This is basicly what I did in bug 27584 to improve the OAI-PMH provider performance.
I've created bug 28268 about this. I'll post a patch that you could try soon.
Best, Ere
Aleisha Amohia kirjoitti 30.4.2021 klo 0.13:
Hi Alvaro
Thank you for responding! I did try lowering the batch commit but it doesn't appear to even get to the part where it starts processing records. I think it gets stuck on fetching the records.
Aleisha
On 30/04/21 3:39 am, Alvaro Cornejo wrote:
Hi Aleisha
Have you tried to lower the batch commit?
see https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.h...
<https://perldoc.koha-community.org/misc/search_tools/rebuild_elasticsearch.html>
Regards,
Alvaro
|----------------------------------------------------------------------------------------|
Stay safe / Cuídate/ Reste sécurisé */7/* Switch off as you go / Apaga lo que no usas / Débranchez au fur et à mesure. *q *Recycle always / Recicla siempre / Recyclez toujours P Print only if absolutely necessary / Imprime solo si es necesario / Imprimez seulement si nécessaire
Le mer. 28 avr. 2021 à 23:52, Aleisha Amohia <aleisha@catalyst.net.nz <mailto:aleisha@catalyst.net.nz>> a écrit :
Hi all,
We have a site with around 600,000 authority records and have enabled elasticsearch. Reindexing and searching has worked easily for biblios (there are just over 10,000 biblios) but fails for authorities.
$ sudo koha-elasticsearch --rebuild -d -a -v <instance> [19024] Checking state of authorities index [19024] Dropping and recreating authorities index [19024] Indexing authorities
And then it just hangs there for ages until the server runs out of memory and dies.
We're running ES 6.x on Koha 20.05.x.
I tried running the script directly and specifying an authid and that worked as expected.
$ perl /usr/share/koha/bin/search_tools/rebuild_elasticsearch.pl <http://rebuild_elasticsearch.pl> -a -d -v -ai 2135575 [17684] Checking state of authorities index [17684] Dropping and recreating authorities index [17684] Indexing authorities [17684] Committing final records... [17684] Total 1 records indexed
It feels like this is happening because we have too many authority records. Is there a way to fix this? Has anyone come across this before?
Thanks!
-- *Aleisha Amohia*(she/her) _______________________________________________
Koha mailing list http://koha-community.org <http://koha-community.org> Koha@lists.katipo.co.nz <mailto:Koha@lists.katipo.co.nz> Unsubscribe: https://lists.katipo.co.nz/mailman/listinfo/koha <https://lists.katipo.co.nz/mailman/listinfo/koha>
-- Ere Maijala Kansalliskirjasto / The National Library of Finland
participants (3)
-
Aleisha Amohia -
Alvaro Cornejo -
Ere Maijala