Optimize raw HTML post-processor #1510

pawamoy · 2025-02-21T15:34:10Z

Using a set allows for better performances when checking for membership of a tag within block level elements. Issue-1507: Python-Markdown#1507

Previously, the raw HTML post-processor would precompute all possible replacements for placeholders in a string, based on the HTML stash. It would then apply a regular expression substitution using these replacements. Finally, if the text changed, it would recurse, and do all that again. This was inefficient because placeholders were re-computed each time it recursed, and because only a few replacements would be used anyway. This change moves the recursion into the regular expression substitution, so that: 1. the regular expression does minimal work on the text (contrary to re-scanning text already scanned in previous frames); 2. but more importantly, replacements aren't computed ahead of time anymore (and even less *several times*), and only fetched from the HTML stash as placeholders are found in the text. The substitution function relies on the regular expression groups ordering: we make sure to match `<p>PLACEHOLDER</p>` first, before `PLACEHOLDER`. The presence of a wrapping `p` tag indicates whether to wrap again the substitution result, or not (also depending on whether the substituted HTML is a block-level tag). Issue-1507: Python-Markdown#1507

pawamoy · 2025-02-21T15:53:35Z

Hmm, the list->set change could be seen as breaking. We can instead create a new Markdown._block_level_elements attribute, and use that in isblocklevel(). Let me know if you think that's best.

pawamoy · 2025-02-22T22:54:14Z

markdown/postprocessors.py

+            else:
+                key = m.group(2)
+                wrapped = False
+            if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks):


Could use html_counter instead:

Suggested change

if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks):

if (key := int(key)) >= self.md.htmlStash.html_counter:

pawamoy · 2025-02-22T22:55:32Z

markdown/postprocessors.py

+                return pattern.sub(substitute_match, html)
+            return pattern.sub(substitute_match, f"<p>{html}</p>")
+
+        if self.md.htmlStash.html_counter:


Or we could not use html_counter and only rely on the actual list, rawHtmlBlocks:

Suggested change

if self.md.htmlStash.html_counter:

if self.md.htmlStash.rawHtmlBlocks:

pawamoy · 2025-02-23T15:19:13Z

markdown/postprocessors.py

+            if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks):
+                return m.group(0)
+            html = self.stash_to_string(self.md.htmlStash.rawHtmlBlocks[key])
+            if self.isblocklevel(html) or not wrapped:


Micro-optimization (given the list->set change is applied): make isblocklevel check lazy.

Suggested change

if self.isblocklevel(html) or not wrapped:

if not wrapped or self.isblocklevel(html):

pawamoy added 2 commits February 21, 2025 16:25

Use set instead of list for block level elements

85a9160

Using a set allows for better performances when checking for membership of a tag within block level elements. Issue-1507: Python-Markdown#1507

pawamoy force-pushed the optimize-rawhtml-postprocessor branch from 6113aad to fc9acc0 Compare February 21, 2025 15:38

Add changelog entry for improved raw HTML post-processor perfs

ce408be

This comment was marked as resolved.

Sign in to view

pawamoy commented Feb 22, 2025

View reviewed changes

pawamoy commented Feb 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize raw HTML post-processor #1510

Optimize raw HTML post-processor #1510

pawamoy commented Feb 21, 2025

pawamoy commented Feb 21, 2025

This comment was marked as resolved.

pawamoy Feb 22, 2025

pawamoy Feb 22, 2025

pawamoy Feb 23, 2025

	if (key := int(key)) >= len(self.md.htmlStash.rawHtmlBlocks):
	if (key := int(key)) >= self.md.htmlStash.html_counter:

	if self.md.htmlStash.html_counter:
	if self.md.htmlStash.rawHtmlBlocks:

	if self.isblocklevel(html) or not wrapped:
	if not wrapped or self.isblocklevel(html):

Optimize raw HTML post-processor #1510

Are you sure you want to change the base?

Optimize raw HTML post-processor #1510

Conversation

pawamoy commented Feb 21, 2025

pawamoy commented Feb 21, 2025

This comment was marked as resolved.

pawamoy Feb 22, 2025

Choose a reason for hiding this comment

pawamoy Feb 22, 2025

Choose a reason for hiding this comment

pawamoy Feb 23, 2025

Choose a reason for hiding this comment