[Koha] OPAC searches fail with ICU indexing enabled

Tue May 17 13:51:08 NZST 2016

Hi Andreas,

This problem looks a little familiar. I have a few questions.

You find 335 records using yaz-client. Are you able to view those records using "show" in yaz-client?

Also where are you seeing the following error:

> Error:
> :8: parser error : Input is not proper UTF-8, indicate encoding !
> Bytes: 0xCF 0x3C 0x2F 0x74 Î¿Ï‚ Î¤Î¹Î¼ÏŒÎ¸ÎµÎ¿Î½ Î‘Î„-Ï€ÏÎ¿Ï‚
> Î¤Î¹Î¼ÏŒÎ¸ÎµÎ¿Î½ Î’Î„-Ï€ÏÎ¿Ï‚ Î¤Î¯Ï„Î¿Î½-Ï€ ^

Is that in a file in your /var/log/koha/imp directory? 

Also, those instructions at https://wiki.koha-community.org/wiki/Correcting_Search_of_Arabic_records look a bit suboptimal...

Are you using packages? Did you run the following?

sudo koha-restart-zebra {yourinstance}
sudo koha-rebuild-zebra -f {yourinstance}

That parser error doesn't look super helpful... using Windows-1251 0xCF is Ï, 3C is <, / is 2F. With UTF-8, χ is 0xce 0xa7 and ό is 0xce 0x8c. So there isn't a clear relation there. If I had to guess, I'd say that Zebra thinks it's using ICU and UTF-8 but the data is still stored as Latin-1. 

Failing that... I have some other more in-depth troubleshooting ideas. 

David Cook
Systems Librarian

Prosentient Systems
72/330 Wattle St
Ultimo, NSW 2007

Office: 02 9212 0899
Direct: 02 8005 0595

> -----Original Message-----
> Message: 7
> Date: Wed, 11 May 2016 18:12:51 +0300
> From: Andreas Roussos <arouss1980 at gmail.com>
> To: koha at lists.katipo.co.nz
> Subject: [Koha] OPAC searches fail with ICU indexing enabled
> Message-ID:
> 	<CAK0RUrtVcZZ0jOqgmvPxrcXWw4g_qqQ3_MD5OqHYHkz_sfdcGQ
> @mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> Dear list,
> 
> We're running Koha 3.20.04 on Ubuntu 14.04, and recently enabled
> ICU indexing as per the instructions on the wiki
> (https://wiki.koha-
> community.org/wiki/Correcting_Search_of_Arabic_records)
> 
> Most searches work fine, but queries for certain Greek characters in OPAC
> (for example "[χ.ό.]"), return the following message:
> 
> Error:
> :8: parser error : Input is not proper UTF-8, indicate encoding !
> Bytes: 0xCF 0x3C 0x2F 0x74 Î¿Ï‚ Î¤Î¹Î¼ÏŒÎ¸ÎµÎ¿Î½ Î‘Î„-Ï€ÏÎ¿Ï‚
> Î¤Î¹Î¼ÏŒÎ¸ÎµÎ¿Î½ Î’Î„-Ï€ÏÎ¿Ï‚ Î¤Î¯Ï„Î¿Î½-Ï€ ^
> 
> If I use the command-line zebra client to perform the same search,
> I get 335 hits:
> 
> $ yaz-client -c /etc/koha/zebradb/ccl.properties
> unix:/var/run/koha/imp/bibliosocket
> Connecting...OK.
> Sent initrequest.
> Connection accepted by v3 target.
> ID     : 81
> Name   : Zebra Information Server/GFS/YAZ
> Version: 4.2.30 98864b44c654645bc16b2c54f822dc2e45a93031
> Options: search present delSet triggerResourceCtrl scan sort
> extendedServices namedResultSets
> Elapsed: 0.000743
> Z> base biblios
> Z> f [χ.ό.]
> Sent searchRequest.
> Received SearchResponse.
> Search was a success.
> Number of hits: 335, setno 1
> SearchResult-1: term=χο cnt=335
> records returned: 0
> Elapsed: 0.014453
> 
> So, it looks as if Zebra can actually perform the search but somehow
> the results cannot be displayed in OPAC.
> 
> Does anyone have any clues as to why this is happening?
> 
> Kind regards,
> Andreas
> 
> 
> ------------------------------