Skip to content

Latest commit

 

History

History
20 lines (16 loc) · 1.07 KB

README.md

File metadata and controls

20 lines (16 loc) · 1.07 KB

toc

toc is HTML table of contents generator. It parses html, generate table of contents, and put anchors into original html.

usage

toc_html, body = table_of_contents(html)
toc_html, body = table_of_contents(html, url='http://somedomain.com/somepath')
toc_html, body = table_of_contents(html, anchor_type='following-marker')
  • anchor_type
    • following-marker : Add anchor tag to the end of heading tags. Anchor text is #
    • stacked-number : Add anchor tag to the begining of heading tags. Anchor text is like 1.2.3.
  • toc_html: table of contents
  • body: modified html

install

pip install toc

notes

  • toc use html5lib for html parser. It's much slower than the popular xml library for python, lxml, but parses more precisely, especially for html5.
  • I don't think ElementTree is more pythonic than DOM. So I used minidom for treebuilder and py-dom-xpath for xpath.