Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE when you have an entity inside of attribute wrapped by text #6

Open
pilifs opened this issue Sep 27, 2017 · 0 comments
Open

NPE when you have an entity inside of attribute wrapped by text #6

pilifs opened this issue Sep 27, 2017 · 0 comments

Comments

@pilifs
Copy link

pilifs commented Sep 27, 2017

HTMLParser2 seems to have different behaviour for its ontext and onattribute events with entities. It calls ontext for each chunk of text split by entities, but onattribute is always called with the full attribute value. For example:

Working: <span>before &amp; after</span>
parser.ontext is called with three strings: "before ", the attribute which we ignore, and then " after". before and after are matched as separate errors.

Not working: <img alt="before &amp; after" />
parser.onattribute is only called once with string "before & after". This throws an NPE when we try to get context from the evidence regex, because & != &amp;

I'm happy to submit a fix for this, but I can't come up with a quick non-invasive way to do it. Handling multiple entities, a mix of encoded / unencoded characters, and multiple lines seems like a can of worms. My current hack is to catch the error and run through another parser with decodeEntities set to false for attributes. :)

I'd appreciate any insight you may have. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant