(Koha 3.0 alpha, Gentoo Linux 2.6.24, MySQL 5.0.54) In tracking down some problems I was having, I've realized that it doesn't seem Koha doesn't do any HTML encoding with regards to MARC entry or biblio display. For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following: My Book is <font size="+5">Great</font> Sure enough, when the completed MARC record is submitted, the additem.pl page will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well. Surely everything entered by users and librarian in the OPAC and Intranet sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug? Thanks.
George Adams wrote:
For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following:
My Book is <font size="+5">Great</font>
Sure enough, when the completed MARC record is submitted, the additem.pl page will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well.
Surely everything entered by users and librarian in the OPAC and Intranet sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug?
This is why Koha is susceptible to cross-site scripting attacks, as already raised by someone else on this list a few months back. Example: My book is <script>alert("Gotcha!")</script> cheers rickw -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor A terrorist is someone who has a bomb but can't afford an air force. -- William Blum
George, Rick and all -- In short, no, MARC record subfields should not be HTML encoded. MARC is not a subset of HTML, and you can't just substitute &entities or suppress <tags> and expect everything to be OK. If you are worried about a library's professional catalogers dropping javascript exploits into MARC fields, you have much worse problems than any ILS can solve for you. Don't give staff access, let alone catalog access to such people. One the plus side, congratulations, you have catalogers that can code! For user submitted data, yes, Koha should attend to sanitizing it. But that's not the question here. --joe atzberger On Wed, Mar 5, 2008 at 7:39 PM, Rick Welykochy <rick@praxis.com.au> wrote:
George Adams wrote:
For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following:
My Book is <font size="+5">Great</font>
Sure enough, when the completed MARC record is submitted, the additem.plpage will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well.
Surely everything entered by users and librarian in the OPAC and Intranet sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug?
This is why Koha is susceptible to cross-site scripting attacks, as already raised by someone else on this list a few months back.
Example:
My book is <script>alert("Gotcha!")</script>
cheers rickw
Joe Atzberger wrote:
For user submitted data, yes, Koha should attend to sanitizing it. But that's not the question here.
Yes it should. An example is the "make a suggestion" page, at /cgi-bin/koha/opac-suggestions.pl in Koha/2.2.9. A rogue user can enter HTML into a suggestion and that input is not filtered. A librarian reading the suggestion could then become a victim of XSS. Google for cross site scripting for more info. It is a relatively misunderstood problem that is difficult to deal with in a consistent and reliable manner. cheers rickw -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor A terrorist is someone who has a bomb but can't afford an air force. -- William Blum
Joe, it's not just malicious activity I'm worried about (though that is a fundamental security concern). Unencoded HTML can break a page with frightening ease. Take this simple field: <input type="text" name="booktitle" value="$title"> Now if $title has the value: How to Say "I Love You" in 50 Languages, your HTML code will be rendered like this: <input type="text" name="booktitle" value="How to Say "I Love You" in 10 Languages> and is now hopelessly broken. The CGI param $booktitle will contain "How to Say ", and the rest of the book title (in addition to breaking the HTML tag) will be lost. I can hardly expect all the library staff to remember not to use double-quotes in any Koha text form (or any other unsafe characters like < , > or & ). Indeed, should they really be forced to give up such common characters just to workaround the problem? I think I'll try mocking up something with HTML::Entities, at least in the most critical parts of the "Add Marc Item" form. Meanwhile, if no one objects, I'll put in a bug report for it too. Date: Wed, 5 Mar 2008 21:46:33 -0500 From: ohiocore@gmail.com To: rick@praxis.com.au Subject: Re: [Koha] HTML not being encoded for display? CC: g_adams27@hotmail.com; koha@lists.katipo.co.nz George, Rick and all -- In short, no, MARC record subfields should not be HTML encoded. MARC is not a subset of HTML, and you can't just substitute &entities or suppress <tags> and expect everything to be OK. If you are worried about a library's professional catalogers dropping javascript exploits into MARC fields, you have much worse problems than any ILS can solve for you. Don't give staff access, let alone catalog access to such people. One the plus side, congratulations, you have catalogers that can code! For user submitted data, yes, Koha should attend to sanitizing it. But that's not the question here. --joe atzberger On Wed, Mar 5, 2008 at 7:39 PM, Rick Welykochy <rick@praxis.com.au> wrote: George Adams wrote:
For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following:
My Book is <font size="+5">Great</font>
Sure enough, when the completed MARC record is submitted, the additem.pl page will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well.
Surely everything entered by users and librarian in the OPAC and Intranet sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug?
This is why Koha is susceptible to cross-site scripting attacks, as already raised by someone else on this list a few months back. Example: My book is <script>alert("Gotcha!")</script> cheers rickw _________________________________________________________________ Helping your favorite cause is as easy as instant messaging. You IM, we give. http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join
On 3/6/08, George Adams <g_adams27@hotmail.com> wrote:
Joe, it's not just malicious activity I'm worried about (though that is a fundamental security concern). Unencoded HTML can break a page with frightening ease. Take this simple field:
<input type="text" name="booktitle" value="$title">
Now if $title has the value: How to Say "I Love You" in 50 Languages, your HTML code will be rendered like this:
<input type="text" name="booktitle" value="How to Say "I Love You" in 10 Languages>
and is now hopelessly broken. The CGI param $booktitle will contain "How to Say ", and the rest of the book title (in addition to breaking the HTMLtag) will be lost.
Yep if you find any instances of this happening bug report it (this of course isn't what I would call unencoded HTML). I can hardly expect all the library staff to remember not to use
double-quotes in any Koha text form (or any other unsafe characters like < ,
or & ). Indeed, should they really be forced to give up such common characters just to workaround the problem?
No, and in fact they don't http://203.97.214.51:8080/cgi-bin/koha/opac-detail.pl?biblionumber=2 I think I'll try mocking up something with HTML::Entities, at least in the
most critical parts of the "Add Marc Item" form. Meanwhile, if no one objects, I'll put in a bug report for it too.
If you put in bug report for specific areas where enescaped html is causing a problem, then we can simply edit the templates to add a ESCAPE="HTML" to the TMPL_VAR that needs it. Please don't convert things to entities to store in the database. This data is used by more than just web browsers. So by all means bug report away, but if you give url's of pages where unescaped characters are causing problems then it will be much more useful Thanks Chris
Thanks for the tip about adding ESCAPE="HTML" to the template tags - that's a nice feature. I've been able to change additem.tmpl, opac-results.tmpl, opac-detail.tmpl and opac-MARCdetail.tmpl to make our entries display correctly. (That only scratches the surface, of course; I'm guessing that the Right Thing would be to change pretty much every single template that displays any user-generated content so that it's escaped. But I'm also guessing that's a big undertaking.) Date: Thu, 6 Mar 2008 19:15:42 +1300 From: chris@bigballofwax.co.nz To: g_adams27@hotmail.com Subject: Re: [Koha] HTML not being encoded for display? CC: koha@lists.katipo.co.nz On 3/6/08, George Adams <g_adams27@hotmail.com> wrote: Joe, it's not just malicious activity I'm worried about (though that is a fundamental security concern). Unencoded HTML can break a page with frightening ease. Take this simple field: <input type="text" name="booktitle" value="$title"> Now if $title has the value: How to Say "I Love You" in 50 Languages, your HTML code will be rendered like this: <input type="text" name="booktitle" value="How to Say "I Love You" in 10 Languages> and is now hopelessly broken. The CGI param $booktitle will contain "How to Say ", and the rest of the book title (in addition to breaking the HTML tag) will be lost. Yep if you find any instances of this happening bug report it (this of course isn't what I would call unencoded HTML). I can hardly expect all the library staff to remember not to use double-quotes in any Koha text form (or any other unsafe characters like < , > or & ). Indeed, should they really be forced to give up such common characters just to workaround the problem? No, and in fact they don't http://203.97.214.51:8080/cgi-bin/koha/opac-detail.pl?biblionumber=2 I think I'll try mocking up something with HTML::Entities, at least in the most critical parts of the "Add Marc Item" form. Meanwhile, if no one objects, I'll put in a bug report for it too. If you put in bug report for specific areas where enescaped html is causing a problem, then we can simply edit the templates to add a ESCAPE="HTML" to the TMPL_VAR that needs it. Please don't convert things to entities to store in the database. This data is used by more than just web browsers. So by all means bug report away, but if you give url's of pages where unescaped characters are causing problems then it will be much more useful Thanks Chris _________________________________________________________________ Shed those extra pounds with MSN and The Biggest Loser! http://biggestloser.msn.com/
Are you working with Koha 3 George? Chris On 3/7/08, George Adams <g_adams27@hotmail.com> wrote:
Thanks for the tip about adding ESCAPE="HTML" to the template tags - that's a nice feature. I've been able to change additem.tmpl, opac-results.tmpl, opac-detail.tmpl and opac-MARCdetail.tmpl to make our entries display correctly. (That only scratches the surface, of course; I'm guessing that the Right Thing would be to change pretty much every single template that displays any user-generated content so that it's escaped. But I'm also guessing that's a big undertaking.)
------------------------------ Date: Thu, 6 Mar 2008 19:15:42 +1300 From: chris@bigballofwax.co.nz To: g_adams27@hotmail.com Subject: Re: [Koha] HTML not being encoded for display? CC: koha@lists.katipo.co.nz
On 3/6/08, *George Adams* <g_adams27@hotmail.com> wrote:
Joe, it's not just malicious activity I'm worried about (though that is a fundamental security concern). Unencoded HTML can break a page with frightening ease. Take this simple field:
<input type="text" name="booktitle" value="$title">
Now if $title has the value: How to Say "I Love You" in 50 Languages, your HTML code will be rendered like this:
<input type="text" name="booktitle" value="How to Say "I Love You" in 10 Languages>
and is now hopelessly broken. The CGI param $booktitle will contain "How to Say ", and the rest of the book title (in addition to breaking the HTMLtag) will be lost.
Yep if you find any instances of this happening bug report it (this of course isn't what I would call unencoded HTML).
I can hardly expect all the library staff to remember not to use double-quotes in any Koha text form (or any other unsafe characters like < ,
or & ). Indeed, should they really be forced to give up such common characters just to workaround the problem?
No, and in fact they don't http://203.97.214.51:8080/cgi-bin/koha/opac-detail.pl?biblionumber=2
I think I'll try mocking up something with HTML::Entities, at least in the most critical parts of the "Add Marc Item" form. Meanwhile, if no one objects, I'll put in a bug report for it too.
If you put in bug report for specific areas where enescaped html is causing a problem, then we can simply edit the templates to add a ESCAPE="HTML" to the TMPL_VAR that needs it. Please don't convert things to entities to store in the database. This data is used by more than just web browsers.
So by all means bug report away, but if you give url's of pages where unescaped characters are causing problems then it will be much more useful
Thanks
Chris
------------------------------ Shed those extra pounds with MSN and The Biggest Loser! Learn more.<http://biggestloser.msn.com/>
Yes, I am - 3.0alpha . And I may have been too optimistic with the changes I made. For example, here's my experiment with opac-results.tmpl . 1) I had a record entered with the somewhat silly title (245a) of "One Two <Three> Four", with a general note (500a) of "Here is a note". When I did a search for "note" in the OPAC, this record came up in the search results with the title displayed as "One Two Four" (i.e. <Three> was unescaped and treated as a tag, and not displayed as literal text). 2) Next, I went into opac-results.tmpl , found the appropriate <!-- TMPL_VAR NAME="title" --> line and changed it to <!-- TMPL_VAR NAME="title" ESCAPE="HTML" --> . When I reloaded the search results page, the title was correctly displayed as "One Two <Three> Four" as I had hoped. So far, so good. 3) But next, I did a search for "Two" - i.e. a word that appears in the title of the book. This time, the search results displayed the following: One <span class="term">Two</span> <Three> Four With that, I discovered that HTML appears to sometimes be injected into tags like "title" (in order to highlight the word or whatever), and that simply escaping that variable wasn't going to work. Ideally, the variable would be retrieved from the DB, HTML-escaped, then wrapped in whatever other HTML tags need to be included, and rendered. Practically... I have no idea how to do that. I guess we're just going to have to do our best to work around it. Date: Fri, 7 Mar 2008 08:20:57 +1300 From: chris@bigballofwax.co.nz To: g_adams27@hotmail.com Subject: Re: [Koha] HTML not being encoded for display? CC: koha@lists.katipo.co.nz Are you working with Koha 3 George? Chris On 3/7/08, George Adams <g_adams27@hotmail.com> wrote: Thanks for the tip about adding ESCAPE="HTML" to the template tags - that's a nice feature. I've been able to change additem.tmpl, opac-results.tmpl, opac-detail.tmpl and opac-MARCdetail.tmpl to make our entries display correctly. (That only scratches the surface, of course; I'm guessing that the Right Thing would be to change pretty much every single template that displays any user-generated content so that it's escaped. But I'm also guessing that's a big undertaking.) Date: Thu, 6 Mar 2008 19:15:42 +1300 From: chris@bigballofwax.co.nz To: g_adams27@hotmail.com Subject: Re: [Koha] HTML not being encoded for display? CC: koha@lists.katipo.co.nz On 3/6/08, George Adams <g_adams27@hotmail.com> wrote: Joe, it's not just malicious activity I'm worried about (though that is a fundamental security concern). Unencoded HTML can break a page with frightening ease. Take this simple field: <input type="text" name="booktitle" value="$title"> Now if $title has the value: How to Say "I Love You" in 50 Languages, your HTML code will be rendered like this: <input type="text" name="booktitle" value="How to Say "I Love You" in 10 Languages> and is now hopelessly broken. The CGI param $booktitle will contain "How to Say ", and the rest of the book title (in addition to breaking the HTML tag) will be lost. Yep if you find any instances of this happening bug report it (this of course isn't what I would call unencoded HTML). I can hardly expect all the library staff to remember not to use double-quotes in any Koha text form (or any other unsafe characters like < , > or & ). Indeed, should they really be forced to give up such common characters just to workaround the problem? No, and in fact they don't http://203.97.214.51:8080/cgi-bin/koha/opac-detail.pl?biblionumber=2 I think I'll try mocking up something with HTML::Entities, at least in the most critical parts of the "Add Marc Item" form. Meanwhile, if no one objects, I'll put in a bug report for it too. If you put in bug report for specific areas where enescaped html is causing a problem, then we can simply edit the templates to add a ESCAPE="HTML" to the TMPL_VAR that needs it. Please don't convert things to entities to store in the database. This data is used by more than just web browsers. So by all means bug report away, but if you give url's of pages where unescaped characters are causing problems then it will be much more useful Thanks Chris Shed those extra pounds with MSN and The Biggest Loser! Learn more. _________________________________________________________________ Connect and share in new ways with Windows Live. http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_012008
For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following:
My Book is <font size="+5">Great</font>
Sure enough, when the completed MARC record is submitted, the additem.pl
This could be a serious problem. Is this addressed in Koha 3? Are their any checks for dangerous user input in Koha 2 or 3? -cht Chris Hammond-Thrasher MLIS CISSP Library Systems Manager University of the South Pacific Suva, Fiji +679 3232233 hammondthrasher_c@usp.ac.fj -----Original Message----- From: koha-bounces@lists.katipo.co.nz [mailto:koha-bounces@lists.katipo.co.nz] On Behalf Of Rick Welykochy Sent: Thursday, 6 March 2008 12:39 PM To: George Adams Cc: koha@lists.katipo.co.nz Subject: Re: [Koha] HTML not being encoded for display? George Adams wrote: page will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well.
Surely everything entered by users and librarian in the OPAC and Intranet
sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug? This is why Koha is susceptible to cross-site scripting attacks, as already raised by someone else on this list a few months back. Example: My book is <script>alert("Gotcha!")</script> cheers rickw -- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor A terrorist is someone who has a bomb but can't afford an air force. -- William Blum _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
Shifting this over the devel lists, where it can be discussed more fully. Chris On 3/7/08, Chris Hammond-Thrasher <hammondthrasher_c@usp.ac.fj> wrote:
This could be a serious problem. Is this addressed in Koha 3? Are their any checks for dangerous user input in Koha 2 or 3?
-cht
Chris Hammond-Thrasher MLIS CISSP Library Systems Manager University of the South Pacific Suva, Fiji +679 3232233 hammondthrasher_c@usp.ac.fj
-----Original Message----- From: koha-bounces@lists.katipo.co.nz [mailto:koha-bounces@lists.katipo.co.nz] On Behalf Of Rick Welykochy Sent: Thursday, 6 March 2008 12:39 PM To: George Adams Cc: koha@lists.katipo.co.nz Subject: Re: [Koha] HTML not being encoded for display?
George Adams wrote:
For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following:
My Book is <font size="+5">Great</font>
Sure enough, when the completed MARC record is submitted, the additem.pl page will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well.
Surely everything entered by users and librarian in the OPAC and Intranet sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug?
This is why Koha is susceptible to cross-site scripting attacks, as already raised by someone else on this list a few months back.
Example:
My book is <script>alert("Gotcha!")</script>
cheers rickw
-- ________________________________________________________________ Rick Welykochy || Praxis Services || Internet Driving Instructor
A terrorist is someone who has a bomb but can't afford an air force. -- William Blum _______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
_______________________________________________ Koha mailing list Koha@lists.katipo.co.nz http://lists.katipo.co.nz/mailman/listinfo/koha
participants (5)
-
Chris Cormack -
Chris Hammond-Thrasher -
George Adams -
Joe Atzberger -
Rick Welykochy