Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further whitespace issues #14

Closed
alvinlindstam opened this issue Dec 19, 2016 · 3 comments
Closed

Further whitespace issues #14

alvinlindstam opened this issue Dec 19, 2016 · 3 comments

Comments

@alvinlindstam
Copy link

As noted in #4, mochiweb_html ignores whitespace between tags. The fix (replacing the input string using a regexp) doesn't really fix the issues though.

In this example, the whitespace is not only spaces, but also a newline. Space is not the only space character in HTML.

iex(39)> HtmlSanitizeEx.basic_html("<a href=\"almost\">on my mind</a>  <a href=\"almost\">all day long</a>")
"<a href=\"almost\">on my mind</a> <a href=\"almost\">all day long</a>"
iex(40)> HtmlSanitizeEx.basic_html("<a href=\"almost\">on my mind</a>  \n<a href=\"almost\">all day long</a>")
"<a href=\"almost\">on my mind</a><a href=\"almost\">all day long</a>"

In this example, mochiweb properly parses the textarea contents, which is later escaped on output. But since we regex-replaced the space with &#32;, that sequense is also escaped:

iex(50)> HtmlSanitizeEx.html5("<textarea> <script></script></textarea>") 
"<textarea>&amp;#32;&lt;script&gt;&lt;/script&gt;</textarea>"

The first issue could probably be solved with an extended regexp to match all space characters, while the second one could only be solved by making the parser keep all text nodes.

@alvinlindstam
Copy link
Author

It seems that I was a little quick on the keys, #12 seems to be about the same issue.

@rrrene
Copy link
Owner

rrrene commented Dec 19, 2016

Mmh, that's not good. I did not think of <textarea> when contemplating fixes to this problem. <pre> might very well have the same problems.

Thanks for reporting! 👍

rrrene added a commit that referenced this issue Apr 30, 2017
@rrrene rrrene closed this as completed in fdfe0a9 Apr 30, 2017
@rrrene
Copy link
Owner

rrrene commented Apr 30, 2017

@alvinlindstam I know it has been a while, but I just published v1.3.0-rc1 which should fix this issue. If you are still using html_sanitize_ex, please try it out! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants