You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I converted a document and exported it to markdown format, the results showed differences from the two version docling==2.21 and docling==2.20.
I used export labels including DEFAULT_EXPORT_LABELS plus DocItemLabel.FORM, DocItemLabel.KEY_VALUE_REGION
Steps to reproduce
Install the version docling==2.21 or docling==2.20
Run the following
source = "./test-document.pdf"# document per local path or URL
print(os.path.exists(source))
IMAGE_RESOLUTION_SCALE = 2.0
# previous `PipelineOptions` is now `PdfPipelineOptions`
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE
pipeline_options.generate_page_images = True
pipeline_options.generate_picture_images = True
ocr_options = TesseractOcrOptions(force_full_page_ocr=False, lang=["auto"])
pipeline_options.ocr_options = ocr_options
accelerator_options = AcceleratorOptions(
num_threads=4, device=AcceleratorDevice.CPU
)
pipeline_options.accelerator_options = accelerator_options
# ...# Custom options are now defined per format.
doc_converter = (
DocumentConverter( # all of the below is optional, has internal defaults.
allowed_formats=[
InputFormat.PDF,
InputFormat.IMAGE,
InputFormat.DOCX,
InputFormat.HTML,
InputFormat.PPTX,
InputFormat.XLSX,
], # whitelist formats, non-matching files are ignored.
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options, # pipeline options go here.
backend=DoclingParseV2DocumentBackend # optional: pick an alternative backend
),
InputFormat.DOCX: WordFormatOption(
pipeline_cls=SimplePipeline # default for office formats and HTML
),
},
)
)
# read file
from io import BytesIO
stream = DocumentStream(
name=source,
stream=BytesIO(open(source, mode='rb').read())
)
result = doc_converter.convert(stream, raises_on_error=False)
# export to mdprint(result.document.export_to_markdown())
Export results
docling==2.20
<!-- image -->
## INTERNATIONAL MEDICAL CENTER
Address: 1234 Fake Street, District 9, City Hotline: (555) 123-4567 | Email: [email protected]
## Patient Medical Record Report
## Title: Well Child Visit (Procedure)
| Aspect | Before Treatment | After Treatment |
|----------------------------|-------------------------------|------------------------------|
| Immunizations Administered | Hib (PRP-OMP) | Hib (PRP-OMP) |
| | Rotavirus, monovalent | Rotavirus, monovalent |
| | IPV | IPV |
| | DTaP | DTaP |
| | Pneumococcal conjugate PCV 13 | Pneumococcal conjugal PCV 13 |
## Notes:
- · The patient received the following immunizations during the well-child visit: Hib (PRP-OMP), rotavirus, monovalent, IPV, DTaP, and Pneumococcal conjugate PCV 13.
- · No adverse reactions were noted following the administration of these immunizations.
Confidential Medical Record | All rights reserved Generated on: 2025-02-11
docling==2.21
<!-- image -->
## INTERNATIONAL MEDICAL CENTER
Address: 1234 Fake Street, District 9, City Hotline: (555) 123-4567 | Email: [email protected]
## Patient Medical Record Report
## Title: Well Child Visit (Procedure)
| Aspect | Before Treatment | After Treatment |
|----------------------------|-------------------------------|------------------------------|
| Immunizations Administered | Hib (PRP-OMP) | Hib (PRP-OMP) |
| | Rotavirus, monovalent | Rotavirus, monovalent |
| | IPV | IPV |
| | DTaP | DTaP |
| | Pneumococcal conjugate PCV 13 | Pneumococcal conjugal PCV 13 |
## Notes:
- · The patient received the following immunizations during the well-child visit: Hib (PRP-OMP), rotavirus, monovalent, IPV, DTaP, and Pneumococcal conjugate PCV 13.
- · No adverse reactions were noted following the administration of these immunizations.
missing: Confidential Medical Record | All rights reserved Generated on: 2025-02-11
Bug
When I converted a document and exported it to markdown format, the results showed differences from the two version docling==2.21 and docling==2.20.
I used export labels including
DEFAULT_EXPORT_LABELS
plusDocItemLabel.FORM
,DocItemLabel.KEY_VALUE_REGION
Steps to reproduce
Export results
docling==2.20
docling==2.21
missing:
Confidential Medical Record | All rights reserved Generated on: 2025-02-11
Docling version
docling==2.21 or docling==2.20
Python version
python=3.12
Attachment
test-doc.pdf
The text was updated successfully, but these errors were encountered: