Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite to use html5lib >= 0.99999999 #250

Merged
merged 12 commits into from
Feb 24, 2017
Merged

Rewrite to use html5lib >= 0.99999999 #250

merged 12 commits into from
Feb 24, 2017

Conversation

willkg
Copy link
Member

@willkg willkg commented Feb 22, 2017

This needs another going over. Plus I bumped into a few areas that don't have behavior tests, so it's possible there are things that are broken or not finished, yet. I'll keep working on those.

Fixes #229

This is a bleach rewrite to use the new sanitizer API in html5lib 0.99999999.
The new API happens as a filter when emitting the tree rather than in
the tokenizer. Because of that, the output of .clean() and .linkify() are
different than in previous versions of bleach.
I also moved test_nasty to the regression tests because it's just like those.
* address FIXMEs
* minor cleanup to code and comments
@willkg
Copy link
Member Author

willkg commented Feb 22, 2017

What's next:

  1. implement missing tests
  2. review the changes one more time and make sure CHANGES is correct
  3. review docs
  4. put this in a .tar.gz and host somewhere for others to test
  5. fix any issues that come out of that
  6. land this

* this adds some missing tests to add more coverage
* html5lib 0.99999999 and 0.999999999 have an alphabeticalattributes filter that
  doesn't work when the attributes set has some items with a namespace and some
  without in Python 3; this rolls alphabetizing into the Bleach sanitizer
* remove some dead code and clean some other code up
clean('<script/xss src="http://xx.com/xss.js"></script>') in
[
'&lt;script src="http://xx.com/xss.js" xss=""&gt;&lt;/script&gt;',
'&lt;script xss="" src="http://xx.com/xss.js"&gt;&lt;/script&gt;'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to fix this we need to sort the attributes and then escape it. That happens in .disallowed_token in the sanitizer. I'll think about that.

Making this change means the output is stable since attributes will always
happen in the same order. Seems like maybe it's not a great idea, but stable
seems good. If it turns out this is terrible, someone will complain with a
compelling use case and we can undo it.

I also went through and removed a bunch of the "the output is either this or
that" in the tests.
@willkg
Copy link
Member Author

willkg commented Feb 24, 2017

Ok. I think that's good enough for now. Merging.

@willkg willkg changed the title [WIP] Html5lib update Rewrite to use html5lib >= 0.99999999 Feb 24, 2017
@willkg willkg merged commit 543984c into mozilla:master Feb 24, 2017
@willkg willkg deleted the html5lib-update branch March 6, 2017 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant