-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize raw HTML post-processor #1510
base: master
Are you sure you want to change the base?
Optimize raw HTML post-processor #1510
Conversation
Using a set allows for better performances when checking for membership of a tag within block level elements. Issue-1507: Python-Markdown#1507
Previously, the raw HTML post-processor would precompute all possible replacements for placeholders in a string, based on the HTML stash. It would then apply a regular expression substitution using these replacements. Finally, if the text changed, it would recurse, and do all that again. This was inefficient because placeholders were re-computed each time it recursed, and because only a few replacements would be used anyway. This change moves the recursion into the regular expression substitution, so that: 1. the regular expression does minimal work on the text (contrary to re-scanning text already scanned in previous frames); 2. but more importantly, replacements aren't computed ahead of time anymore (and even less *several times*), and only fetched from the HTML stash as placeholders are found in the text. The substitution function relies on the regular expression groups ordering: we make sure to match `<p>PLACEHOLDER</p>` first, before `PLACEHOLDER`. The presence of a wrapping `p` tag indicates whether to wrap again the substitution result, or not (also depending on whether the substituted HTML is a block-level tag). Issue-1507: Python-Markdown#1507
6113aad
to
fc9acc0
Compare
Hmm, the list->set change could be seen as breaking. We can instead create a new |
This comment was marked as resolved.
This comment was marked as resolved.
else: | ||
key = m.group(2) | ||
wrapped = False | ||
if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use html_counter
instead:
if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks): | |
if (key := int(key)) >= self.md.htmlStash.html_counter: |
return pattern.sub(substitute_match, html) | ||
return pattern.sub(substitute_match, f"<p>{html}</p>") | ||
|
||
if self.md.htmlStash.html_counter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could not use html_counter
and only rely on the actual list, rawHtmlBlocks
:
if self.md.htmlStash.html_counter: | |
if self.md.htmlStash.rawHtmlBlocks: |
if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks): | ||
return m.group(0) | ||
html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[key]) | ||
if self.isblocklevel(html) or not wrapped: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Micro-optimization (given the list->set change is applied): make isblocklevel
check lazy.
if self.isblocklevel(html) or not wrapped: | |
if not wrapped or self.isblocklevel(html): |
Closes #1507