NPE when you have an entity inside of attribute wrapped by text #6

pilifs · 2017-09-27T15:05:34Z

HTMLParser2 seems to have different behaviour for its ontext and onattribute events with entities. It calls ontext for each chunk of text split by entities, but onattribute is always called with the full attribute value. For example:

Working: <span>before & after</span>
parser.ontext is called with three strings: "before ", the attribute which we ignore, and then " after". before and after are matched as separate errors.

Not working: <img alt="before & after" />
parser.onattribute is only called once with string "before & after". This throws an NPE when we try to get context from the evidence regex, because & != &

I'm happy to submit a fix for this, but I can't come up with a quick non-invasive way to do it. Handling multiple entities, a mix of encoded / unencoded characters, and multiple lines seems like a can of worms. My current hack is to catch the error and run through another parser with decodeEntities set to false for attributes. :)

I'd appreciate any insight you may have. Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPE when you have an entity inside of attribute wrapped by text #6

NPE when you have an entity inside of attribute wrapped by text #6

pilifs commented Sep 27, 2017

NPE when you have an entity inside of attribute wrapped by text #6

NPE when you have an entity inside of attribute wrapped by text #6

Comments

pilifs commented Sep 27, 2017