<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 10pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'>
Joe, it's not just malicious activity I'm worried about (though that is a fundamental security concern). Unencoded HTML can break a page with frightening ease. Take this simple field:<br><br><input type="text" name="booktitle" value="$title"><br><br>Now if $title has the value: How to Say "I Love You" in 50 Languages, your HTML code will be rendered like this:<br><br><input type="text" name="booktitle" value="How to Say "I Love You" in 10 Languages><br><br>and is now hopelessly broken. The CGI param $booktitle will contain "How to Say ", and the rest of the book title (in addition to breaking the HTML tag) will be lost.<br><br>I can hardly expect all the library staff to remember not to use double-quotes in any Koha text form (or any other unsafe characters like < , > or & ). Indeed, should they really be forced to give up such common characters just to workaround the problem?<br><br>I think I'll try mocking up something with HTML::Entities, at least in the most critical parts of the "Add Marc Item" form. Meanwhile, if no one objects, I'll put in a bug report for it too.<br><br><br><br><blockquote><hr>Date: Wed, 5 Mar 2008 21:46:33 -0500<br>From: ohiocore@gmail.com<br>To: rick@praxis.com.au<br>Subject: Re: [Koha] HTML not being encoded for display?<br>CC: g_adams27@hotmail.com; koha@lists.katipo.co.nz<br><br>George, Rick and all --<br><br>In short, no, MARC record subfields should not be HTML encoded. MARC is not a subset of HTML, and you can't just substitute &entities or suppress <tags> and expect everything to be OK. If you are worried about a library's professional catalogers dropping javascript exploits into MARC fields, you have much worse problems than any ILS can solve for you. Don't give staff access, let alone catalog access to such people. One the plus side, congratulations, you have catalogers that can code!<br>
<br>For user submitted data, yes, Koha should attend to sanitizing it. But that's not the question here.<br><br>--joe atzberger<br><br><br><div class="EC_gmail_quote">On Wed, Mar 5, 2008 at 7:39 PM, Rick Welykochy <<a href="mailto:rick@praxis.com.au">rick@praxis.com.au</a>> wrote:<br>
<blockquote class="EC_gmail_quote" style="padding-left: 1ex;"><div class="EC_Ih2E3d">George Adams wrote:<br>
<br>
> For example, in the "Add a MARC Record" section, I can enter in a title (tag 245c) of the following:<br>
><br>
> My Book is <font size="+5">Great</font><br>
><br>
> Sure enough, when the completed MARC record is submitted, the additem.pl page will show the title with the word "Great" really big. Once added to the catalog, it will show up in the search engines with that word really big as well.<br>
><br>
> Surely everything entered by users and librarian in the OPAC and Intranet sites should be HTML-encoded if it's going to be redisplayed, right? Did I miss some setting in the Administration menus that would disallow HTML from being entered in a form, or is this a fairly big bug?<br>
<br>
<br>
</div>This is why Koha is susceptible to cross-site scripting attacks, as already<br>
raised by someone else on this list a few months back.<br>
<br>
Example:<br>
<br>
My book is <script>alert("Gotcha!")</script><br>
<br>
cheers<br>
rickw<font color="#888888"><br></font></blockquote></div><br>
</blockquote><br /><hr />Helping your favorite cause is as easy as instant messaging. You IM, we give. <a href='http://im.live.com/Messenger/IM/Home/?source=text_hotmail_join' target='_new'>Learn more.</a></body>
</html>