- Deep Shift Labs Development Blog - http://www.deepshiftlabs.com/dev_blog -

Mantis and Answers integration

Posted By On 4 August 2019 @ 11:44 | Comments Disabled

Mantis and Answers integration

We have been working on Starty.co Answers [1] for quite a while, and we cannot stop adding new features and hone existing ones.
Finally we decided to release the first version of Answers without such a convenient thing as text markup, like BBCode [2]. However, we did some preparatory work in this area and decided that in the near future it will include the TinyMCE [3] WYSYWIG and Highlight.js [4] module to highlight code snippets.

Even though we decided not to include a full markup in version one, it was necessary to implement the basic things – to replace the line breaks with <br> and parse http(s):// links in the text and replace them with <a href=”"></a>.

After adding a new line to <br> translation, I started looking how smart people implement link parsing and replacement with <a href=”"></a> in the text. In the end I settled on this approach [5]:

View Code [6] PHP
$string = preg_replace [7]('@(https?://([-\w\.]+)+(/([\w/_\.]*(\?\S+)?(#\S+)?)?)?)@',
'<a href="$1">$1</a>',
$string);

But first, as usual, I looked if there was anything suitable on the php.net [8] site. I came across this [9] comment where simple rules are described for preg_replace() to process BBCode. And I thought of adding support of at least a few basic BBCode tags, but we were planning to add TinyMCE with HTML-markup! If I would implement a simple BBCode in the first version, then we will have to support it in a next version as someone will use BBCode and it will suddenly stop working. Or, during an upgrade from version one it will be necessary to look for and change the BBCode in the database. This is not good, I thought, and decided that with the same success I can quickly implement several popular html-tags.

At this point I started reading how to store and display text with html-markup. There are no problems storing it, actually; everything is clear. For databases, it does not matter what is inside the text.
The problem, of course, is with the output. On the one hand, there is a requirement to keep entered HTML-tags, on the other hand, we can’t allow XSS [10] to be dragged in, or just not allow them to break the entire page layout.

You may ask: Why developers of TinyMCE, an advanced WYSIWYG-editor, do not take care of elimination of XSS from HTML code, generated as the output? After all, if we disable the ability to edit HTML code in TinyMCE – the user will not be able to push XSS through TinyMCE.
Unfortunately, we must remember that we cannot trust any data coming from the client. Switched off HTML mode can be enabled on the client side easily as it is JavaScript. And anyway, no one will prevent a hacker from POST-ing data – the server cannot distinguish who prepared the data in the request, TinyMCE or not.
Therefore, although TinyMCE developers did some stuff to prevent attacks, complete protection from XSS is not the task of the editor running on client side.

It would seem, PHP has the tools for XSS prevention – such as strip_tags() and htmlentities(). However, with custom text with HTML-markup they won’t help.
Suppose we want to implement even the simplest tag <b></b>, and parse a custom text user_msg this way:

View Code [6] PHP
user_msg_safe = strip_tags [11] (user_msg, '<strong>');</strong>

Nothing bad can happen right? Wrong. That’s [12] a great discussion on stackoverflow. The following can occur:

View Code [6] HTML
<strong style="width: expression(alert(document.location));"> XSS </strong>

Yes, in most browsers, in all modern and popular browsers, such an attack may not lead to anything. But this is not a reason to relax. And this is just a <b> tag! We would like to implement the usual – <b>, <i>, <s>, <pre>, <img>, <a>!
<img>, for example, with its “src” attribute – simply terrible things can happen:

And even replacing or cutting substring “javascript:” from user_msg is not helping as

has the same bad effect.

There’s a tonne of these [13] examples. Yes, most of the vulnerabilities are closed in modern browsers, but even their diversity allows us to understand that processing of custom text with HTML-markup is not a trivial task.

What to do?
One option – use other markup languages – such as BBCode, Markdown, Wiki markup. With them it is much easier to build HTML-safe design. However they are still vulnerable to XSS.
The second option – still use HTML-markup, but pass user_msg through so-called “HTML sanitizer”. In particular, many folks at stackoverflow are recommending HTML Purifier.
It seems that the first pilot Answers version will not support markup language, but very soon we will return to this question, because understand how important this feature is.

References

- vulnerable tags and examples of XSS attacks [14]
- an interesting scientific view [15] at XSS (Russian), at the end of the article suggestions concerning http headers, which help to avoid XSS
- related questions (first [16], second [17], third [18]) on StackOverflow
- HTML sanitizer [19] HTML Purifier
- HTML sanitizer [20] PHP Input Filter
- more on sanitizers [21] on StackOverflow
- XSS Cheat Sheet [22](used as a reference inside this post)
- vulnerability tags [12] and XSS attacks examples (used as a reference inside this post)

Dmitry

P.S. While I was preparing my post, containing lots of HTML tags, for a publication in WordPress, I ran out of steam by trying to make it look correctly. Probably, we should seriously think about using an alternative markup designed specifically for text entered by the users themselves.


Article printed from Deep Shift Labs Development Blog: http://www.deepshiftlabs.com/dev_blog

URL to article: http://www.deepshiftlabs.com/dev_blog/?p=2084&lang=en-us

URLs in this post:

[1] Starty.co Answers: http://answers.starty.co

[2] BBCode: http://ru.wikipedia.org/wiki/Bbcode

[3] TinyMCE: http://www.tinymce.com/

[4] Highlight.js: http://softwaremaniacs.org/soft/highlight/

[5] this approach: http://saturnboy.com/2010/02/parsing-twitter-with-regexp/

[6] View Code: http://www.deepshiftlabs.com/dev_blogjavascript:;

[7] preg_replace: http://www.php.net/preg_replace

[8] php.net: http://php.net

[9] this: http://www.php.net/manual/en/function.preg-replace.php#83974

[10] XSS: http://en.wikipedia.org/wiki/Cross-site_scripting

[11] strip_tags: http://www.php.net/strip_tags

[12] That’s: http://stackoverflow.com/questions/6976053/xss-which-html-tags-and-attributes-can-trigger-javascript-events

[13] these: http://ha.ckers.org/xss.html

[14] vulnerable tags and examples of XSS attacks: http://www.technicalinfo.net/papers/CSS.html/

[15] view: http://habrahabr.ru/post/149152/

[16] first: http://stackoverflow.com/questions/6219003/tinymce-protection-against-cross-site-scripting

[17] second: http://stackoverflow.com/questions/1414986/using-safe-filter-in-django-for-rich-text-fields

[18] third: http://stackoverflow.com/questions/9826970/prevent-xss-but-allow-all-html-tags

[19] HTML sanitizer: http://www.deepshiftlabs.com/dev_blog http://htmlpurifier.org/

[20] HTML sanitizer: http://freecode.com/projects/inputfilter

[21] more on sanitizers: http://stackoverflow.com/questions/1947021/libs-for-html-sanitizing

[22] XSS Cheat Sheet : http://ha.ckers.org/xss.html

Content copyright © 2010 Deep Shift Labs.