toc is HTML table of contents generator. It parses html, generate table of contents, and put anchors into original html.
toc_html, body = table_of_contents(html)
toc_html, body = table_of_contents(html, url='http://somedomain.com/somepath')
toc_html, body = table_of_contents(html, anchor_type='following-marker')
- anchor_type
- following-marker : Add anchor tag to the end of heading tags. Anchor text is
#
- stacked-number : Add anchor tag to the begining of heading tags. Anchor text is like
1.2.3
.
- following-marker : Add anchor tag to the end of heading tags. Anchor text is
toc_html
: table of contentsbody
: modified html
pip install toc
- toc use html5lib for html parser. It's much slower than the popular xml library for python, lxml, but parses more precisely, especially for html5.
- I don't think ElementTree is more pythonic than DOM. So I used
minidom
for treebuilder andpy-dom-xpath
for xpath.