Disabling char normalization does not work #928

tenebrius · 2025-02-10T07:43:50Z

The "一" character in Chinese is changed to "-" using DocumentConverter to convert a standard 'text' PDF to MD. No OCR.

model = nlp_model(loglevel="debug", text_ordering=True)
model.apply_on_text("一些")
>> -些

I tried

self.model = nlp_model(loglevel="debug", text_ordering=True, normalise_chars=False, normalise_text=False)

But no success

2.20.0

3.11

The text was updated successfully, but these errors were encountered:

tenebrius added the bug Something isn't working label Feb 10, 2025

dolfim-ibm added the pdf PDF issue (except docling-parse) label Feb 10, 2025

dolfim-ibm assigned cau-git Feb 10, 2025

Provide feedback