Why not use pandas.DataFrame.to_markdown() instead of converting XLS/XLSX to HTML and then to Markdown? #328
Unanswered
kirisame-wang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I've been working on converting XLS/XLSX files into Markdown format. Currently, the process involves using the
markitdown
library, which first converts the Excel files to HTML usingpandas.DataFrame.to_html()
, and then transforms the HTML into Markdown using BeautifulSoup.Given that pandas offers a
DataFrame.to_markdown()
method, wouldn't it be more efficient to use this method directly for the conversion? This approach seems to provide a direct way to convert DataFrames into Markdown tables, potentially eliminating the need to first convert DataFrames to HTML and then parse the HTML into Markdown.Are there specific reasons or advantages for the current method that involves HTML conversion? Would using
DataFrame.to_markdown()
be a more streamlined solution?I appreciate any insights or explanations regarding this approach.
Beta Was this translation helpful? Give feedback.
All reactions