You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HTMLParser2 seems to have different behaviour for its ontext and onattribute events with entities. It calls ontext for each chunk of text split by entities, but onattribute is always called with the full attribute value. For example:
Working: <span>before & after</span> parser.ontext is called with three strings: "before ", the attribute which we ignore, and then " after". before and after are matched as separate errors.
Not working: <img alt="before & after" /> parser.onattribute is only called once with string "before & after". This throws an NPE when we try to get context from the evidence regex, because & != &
I'm happy to submit a fix for this, but I can't come up with a quick non-invasive way to do it. Handling multiple entities, a mix of encoded / unencoded characters, and multiple lines seems like a can of worms. My current hack is to catch the error and run through another parser with decodeEntities set to false for attributes. :)
I'd appreciate any insight you may have. Thanks!
The text was updated successfully, but these errors were encountered:
HTMLParser2 seems to have different behaviour for its ontext and onattribute events with entities. It calls ontext for each chunk of text split by entities, but onattribute is always called with the full attribute value. For example:
Working:
<span>before & after</span>
parser.ontext
is called with three strings: "before ", the attribute which we ignore, and then " after". before and after are matched as separate errors.Not working:
<img alt="before & after" />
parser.onattribute
is only called once with string "before & after". This throws an NPE when we try to get context from the evidence regex, because& != &
I'm happy to submit a fix for this, but I can't come up with a quick non-invasive way to do it. Handling multiple entities, a mix of encoded / unencoded characters, and multiple lines seems like a can of worms. My current hack is to catch the error and run through another parser with decodeEntities set to false for attributes. :)
I'd appreciate any insight you may have. Thanks!
The text was updated successfully, but these errors were encountered: