You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 9, 2024. It is now read-only.
OCR of PDFs in Tika can take a long time. This is unnecessary if the PDF has already been ORCed.
I would like to see an option to define the OCR strategy used by Tika in the lodestone front end.
Ideally, this would be multi-pass with a first pass being no_ocr and if the size of returned data is below a threshold (perhaps 500 bytes of text) then re-process with text_and_ocr to recognize the document.
The text was updated successfully, but these errors were encountered:
OCR of PDFs in Tika can take a long time. This is unnecessary if the PDF has already been ORCed.
I would like to see an option to define the OCR strategy used by Tika in the lodestone front end.
Ideally, this would be multi-pass with a first pass being no_ocr and if the size of returned data is below a threshold (perhaps 500 bytes of text) then re-process with text_and_ocr to recognize the document.
The text was updated successfully, but these errors were encountered: