Losing HTML classes in HTML to HTML conversion #9318
Replies: 2 comments 2 replies
-
Yes, there have been relevant changes since summer 2022 (see changelog for details -- too many to list or recall). We can look into preserving these attributes; why don't you submit an issue. But generally pandoc will be lossy for HTML -> HTML conversions. If you're going from HTML -> HML, better to skip the conversion step? |
Beta Was this translation helpful? Give feedback.
-
dubsarGimillu, a user of the Pandoc document converter, is experiencing data loss when converting HTML fragments containing tables to HTML. Specifically, classes assigned to table and td elements are being removed in the output. Cause of the Problem: Changes in Pandoc since summer 2022 have improved its ability to process HTML tables. This results in tables being interpreted and potentially losing attributes like classes. Solutions Discussed: Using --from=html+raw_html flag: While this flag preserves raw HTML blocks, it didn't work in this case as alternate row classes (odd and even) were still assigned. Possible Next Steps: Open an issue on Pandoc: dubsarGimillu is encouraged to submit an issue on the Pandoc Github repository to request preserving user-defined classes on tables. Alternative solution: Consider skipping the Pandoc conversion for HTML -> HTML if preserving the specific classes is crucial. Additional Information: The provided changelog for relevant changes since summer 2022 might offer more details about the table parsing improvements. |
Beta Was this translation helpful? Give feedback.
-
My understanding is that the pandoc-discuss mailing list is frozen after a spam attack in December and that this is the preferred place for questions. If that is incorrect please let me know.
I run a static site generated from HTML fragments and Pandoc templates. I could keep HTML classes on
table
andtd
elements when I generated a page in 2022, but when I rebuild the same page today those attributes of the input HTML get erased in the output HTML. Is there a guide to which parts of a HTML file are lost when passing through Pandoc and its abstract model of documents? Has the HTML table > Pandoc > HTML table conversion changed since summer 2022?Eg. today input
becomes output:
In 2022 the "aClass" and "anotherClass" were preserved in the output.
Beta Was this translation helpful? Give feedback.
All reactions