[Koha] Please test Koha MediaWiki Canasta with Koha Portainer today

Thomas Dukleth kohalist at agogme.com
Tue Oct 25 10:35:45 NZDT 2022


Please check the Koha MediaWiki Canasta with Koha Portainer test instance
for bugs, at https://wiki.test.koha-community.org .

Please do not make wiki contributions that you want to save in the
MediaWiki Canasta with Koha Portainer test instance as they will not be
saved when we go live with a freshly migrated up to date copy of the wiki.
 Continue to make lasting contributions to the production wiki at
https://wiki.koha-community.org .

In the absence of any show stopping bugs, we may go live with a freshly
migrated up to date copy of the Koha database on Wednesday 26 October. 
Also in the absence of show stopping bugs, the Koha production wiki may be
marked with a warning message of the impending change and then set to read
only overnight Tuesday or Wednesday morning as a dump is prepared.  If the
production wiki is ultimately not migrated Wednesday in the absence of any
blocking bug please do not be too disappointed, schedules are tight with
preparing a new Koha release and it may happen the following week, in
which case the read only status will be temporarily lifted and a warning
of impending change will be applied.

Sorry for the short testing notice, but I suspect someone will find any
blocking bug quickly after all the previous testing which has been done. 
Last week when this notice should have gone out I temporarily broke all
images and editing on a different test instance of the wiki while trying
to fix a Docker container specific permission bug on a different test
instance of the wiki for a bug which does not occur when the test instance
is running in a standard environment without the Docker container. 
Restarting the Canasta Docker container quickly fixed the broken test wiki
instance allowing the original small bug to be fixed but that is an
example of why even fixing small bugs should be done on a test instance
first.

Please read below for an understanding of what to expect before reporting
issues about which we are already aware, such as the test database is not
a current copy of the wiki and the mail system for resetting login
passwords on the test copy of the wiki is not working.

You may report bugs to the bug "wiki needs updating to a later version",
https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=23073 .

WIKI DATABASE MIGRATION, UPGRADE, AND CONTAINER MANAGEMENT.

For the test instance, the Koha MediaWiki database has been migrated from
Postgres to MySQL and upgraded to MediaWiki 1.35.07, the current long term
stable version using a repeatable process managed with a set of scripts
which I developed in bash, Perl, and Python as appropriate for the task
and previous code in the case of Python.  Choosing Postgres as the
database for a test instance for MediaWiki had left us with a mistake
complicating compatibility and future upgrades when it suddenly became the
only Koha wiki when the previous Koha wiki went down in the midst of a
community schism with LibLime.  The migration process has been developed
and progressively tested over the course over time from 2019 for ensuring
that the database is migrated correctly etc. building upon originally
incomplete and sometimes mistaken Python script of Philipp Spitzer but
which was a fantastic proven starting point without which the task may
have been some degree too much.

Mason James ran a web crawl and diff test to verify that the production
and another test migration of the wiki have the same content except for
evident changes where the production wiki was updated with new content.

The database was imported to MediaWiki Canasta which Tomás Cohen Arazi
identified and customised to connect to the Koha Portainer Docker
container management to provide MediaWiki in a Docker container with a
large set of important extensions to help make managing the MediaWiki
software easier.  See https://github.com/CanastaWiki for more about
MediaWiki Canasta.

See further below for a little about other modifications which I made to
support dynamic archiving, etc.

KNOWN ISSUES.

The test instance is using what may be about a two month old copy of the
database for testing purposes, therefore, lack of currency for the test
wiki content is not a bug.

The mail system for resetting wiki login passwords etc. may not be working
for the test wiki server, however, we expect the mail system to be fixed
on or about the time of going live.

There have been bugs specific to MediaWiki Canasta rearranging some
standard files for the Docker container which have been addressed. 
However, there are at least some Canasta Docker specific bugs relating to
the Docker container environment.  Please report any instances of "Error
creating thumbnail: Unable to save thumbnail to destination" which I found
in the Koha History page https://wiki.test.koha-community.org/wiki/History
.  Instances of the bug can be fixed with command line shell access by
removing the images/thumbs/$buggy_image_name subdirectory for the image
and which allows MediaWiki to recreate the subdirectory without a problem
and the bug goes away.  We should probably remove all the subdirectories
in the images/thumbs directory proactively.  Yet, why is there a special
problem for the Docker container which does not exist in other test
instances when not using a Docker container and the container environment
is running as the root user as standard for Docker containers and the root
user should have all the permission necessary to access or create a
thumbnail directory?  Changing the ownership of the images directory and
subdirectories back and forth to test the effect temporarily broke the
test instance of the wiki until the container was restarted.

All software has bugs and non-blocking bugs will have to be addressed
after going live.

MAJOR ENHANCEMENTS FROM UPGRADING.

The VisualEditor extension used by Wikipedia is a WYSIWIG and guided forms
aid for visually editing the underlying wikitext for a page and using
guided forms for adding some features to a page.  Users can switch back
and forth between source editing in all wikitext syntax and VisualEditor,
however, it may be best to save the current edit before switching back and
forth to avoid problems of imperfect correspondence between wikitext
syntax and the VisualEditor model of wikitext.

The AdvancedSearch extension used by Wikipedia is helpful for a user
friendly interface to construct search queries and modify them by removing
terms which appear in a bubble with an [x] to remove the term. 
AdvancedSearch depends on ElasticSearch which performs remarkably well in
testing and allows the wiki to be reindexed in a couple of minutes if
necessary.  See below for modifications to AdvancedSearch extension.

SemanticMediaWiki was reinstalled before copying the upgraded database. 
Modifying the AdvancedSearch extension in conjunction with special
navigation links is more helpful and custom queries using carefully
managed standard wiki categories may be more helpful than
SemanticMediaWiki.  Furthermore, anyone experimenting with
SemanticMediaWiki should be aware that verbose syntax is required to avoid
breaking most wikis with SemanticMediaWiki after forthcoming MediaWiki
updates in which a hook commonly relied upon for SemanticMediaWiki which
has been deprecated will be removed.  Wikipedia does not use
SemanticMediaWiki and thus some MediaWiki developers may not have given
sufficient consideration to managing the issue.  The workaround may
involve a potential performance deficit when using SemanticMediaWiki
search queries.

The MassEditRegex extension has power one might hope for in the name for
using regular expressions to modify a list of pages.  However, given its
power it remains commented out in LocalSettings.php.  Use is intended to
be for some special group of users such as wiki administrors, however even
they should first test their process on a test version of the wiki. 
Furthermore, use should be with a bot account set up by the user so that
they may be identified as the work of a bot process and those mass changes
may avoid adversely affecting page modification priorities in search
result sets.  The creation of user bot accounts should be documented.  In
testing, MassEditRegex works fantastically well for adding categories to
the bottom of pages and templates to the top of pages which can be done
without risk of an inadequately debugged regular expression breaking page
content.

MODIFIED FEATURES.

I modified the following to support dynamic archiving in which obsolete
content does not appear by default for search results unless the user goes
directly to the advanced search page without following provided navigation
links or changes the default VectorMod skin affecting the basic search
box.

ADVANCEDSEARCH EXTENSION WITH MODIFICATIONS.

The AdvancedSearch extension has been modified to include two additional
form elements: one for excluding particular categories and another for
excluding particular templates.  These additional elements appear in the
user friendly AdvancedSearch term bubbles which can be individually
removed from a query by clicking on the [x] for the particular bubble.

Editing the non-English localisation files is still pending.  For
languages for which a non-English localisation file has not been edited,
the custom fields for category and template exclusion display a
description in English.

DeepCategory searches for subcategories of a category is disabled because
it requires a sparkle database and is only updated on a weekly basis for
Wikipedia.  Searching subcategories of a category should be less of an
issue with faceted use of categories which we should be carefully moving
towards.

Excluding particular categories supports dynamic archiving by supporting
search queries excluding obsolete pages with -incategory:"Obsolete", which
is automatically invoked from the navigation link "Advanced Search
current" or from simple search box when using the modified Vector skin,
VectorMod.  Obsolete pages are also noted with a prominent notice using
the Obsolete template.  Such pages should be updated if they can be, but
are otherwise available to consult most importantly for valuable
information they often contain which is not yet present in current pages. 
Archived obsolete pages can be found exclusively by following the
navigation link
"Advanced search obsolete archive" which includes incategory:"Obsolete"
automatically.

The result set for search queries with incategory:"Obsolete" can be used
to identify the type of pages which should have the Obsolete category and
Obsolete template but do not yet, such as installation information for
some particular old Debian versions.  Various combinations of including
and excluding categories and templates can be easily used in the modified
AdvancedSearch to find pages which only have one of either the Obsolete
category or Obsolete template which should be used together or both
removed if the page has been updated to be current.

All wiki pages should have some category even if it may be
[[Category:Empty]] for people uncertain of what may be appropriate in the
moment.  Pages missing categories may not be disappearing from query
results by category when using ElasticSearch indexing as they had been
when using database based search indexing.  We can also query for pages
missing categories using
https://wiki.koha-community.org/w/index.php?title=Special:UncategorizedPages
and correct the issue which has been neglected due to loss of time where
migrating and upgrading the wiki has been the priority with much less time
available otherwise especially since the pandemic.

We should take some care when thinking about faceted category use as no
wiki software uses fielded categories.  Thus there may be no concise way
to query for pages which address a topic in a general way or supplement
other documentation on a topic containing a lone category such as
[[Category:Circulation]], if we then have many other pages with
[[Category:RFCs]] and [[Category:Circulation]] but no longer
[[Category:Circulation RFCs]] as a possible change for faceting.  In such
an example, the search results of a query for incategory:"Circulation"
might have a result set in which pages for RFCs relating to circulation
issues containing both [[Category:RFCs]] and [[Category:Circulation]]
might crowd out more generally helpful pages with [[Category:Circulation]]
alone.  The problem may indicate a need for a navigation link to exclude
RFCs from a search query; designating old RFCs as obsolete; or both. 
Alternatively or additionally, we may be able to adjust the weighting of
the ElasticSearch indexing options such that pages containing
[[Category:RFCs]] have a lower weight and appear further down the result
set or pages with a single category such as [[Category:Circulation]] alone
or some particular additional categories such as
[[Category:Documentation]] have higher weight and appear further up the
result set.

VECTORMOD SKIN.

Users are free to choose their own preferred MediaWiki skin and we can add
others.  VectorMod is merely set as the default to help people avoid
obsolete pages when submitting search queries from the simple search box
which appears on every page.

VectorMod is a custom version of the Vector skin which includes a modified
version of Vector/includes/templates/SearchBox.mustache supporting dynamic
archiving of obsolete content by excluding pages which have been
designated obsolete by automatically adding -inCategory:"Obsolete" to
basic search querries.  The syntax incategory requires using
ElasticSearch.  Previously, I replaced the SearchBox.mustache file in the
Vector skin
directly, which certainly worked without the extra effort of creating a
custom skin.

Automatically inserting -inCategory:"Obsolete" in the basic search box is
now somewhat elegant in conjunction with the modified AvancedSearch
extension as it uses explanatory language labels with a bubble which has a
removal [x] and allows autocompletion of query terms.

Significant renaming of references to Vector as VectorMod and vector as
vectormod has been scripted will allows both Vector and VectorMod to be
loaded and available to users.


Thomas Dukleth
Agogme
109 E 9th Street, 3D
New York, NY  10003
USA
http://www.agogme.com
+1 212-674-3783




More information about the Koha mailing list