You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To Reproduce
from unstructured.partition.html import partition_html
res = partition_html(filename="/app/KnowledgeBase/test_user/kb0001/1226BL troubleshooting chart/1226BL troubleshooting chart.md")
for i, element in enumerate(res, 1):
print(f"元素 {i}:")
print(f"类型: {type(element).name}")
print(f"内容: {element.text}")
print("-" * 50)
Expected behavior
应该会输出文档中的内容,而不应该删除表格的内容吧?
Screenshots
Environment Info
OS version: Linux-5.4.0-70-generic-x86_64-with-glibc2.35
Python version: 3.10.14
unstructured version: 0.15.14
unstructured-inference is not installed
pytesseract is not installed
Torch version: 2.3.1
Detectron2 version: 0.6
PaddleOCR version: 3.0.0b1 Additional context
截图中提供了部分文档内容
The text was updated successfully, but these errors were encountered:
Describe the bug
partition_html解析时会删除html 表格
To Reproduce
from unstructured.partition.html import partition_html
res = partition_html(filename="/app/KnowledgeBase/test_user/kb0001/1226BL troubleshooting chart/1226BL troubleshooting chart.md")
for i, element in enumerate(res, 1):
print(f"元素 {i}:")
print(f"类型: {type(element).name}")
print(f"内容: {element.text}")
print("-" * 50)
Expected behavior
应该会输出文档中的内容,而不应该删除表格的内容吧?
Screenshots
Environment Info
OS version: Linux-5.4.0-70-generic-x86_64-with-glibc2.35
Python version: 3.10.14
unstructured version: 0.15.14
unstructured-inference is not installed
pytesseract is not installed
Torch version: 2.3.1
Detectron2 version: 0.6
PaddleOCR version: 3.0.0b1
Additional context
截图中提供了部分文档内容
The text was updated successfully, but these errors were encountered: