You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
import os
from unstructured.partition.pdf import partition_pdf
elements = partition_pdf(
filename=pdf_file,
# Unstructured Helpers
strategy="hi_res",
infer_table_structure=True,
model_name="yolox",
languages=["eng"] # this line can be deleted and the same error pops up
@AlexSkrn I cannot see the Collab site you provided without permissions. But from your copy-paste I'd say use the latest, just with pip install "unstructured[all-docs]" and drop the ==0.12.5 suffix. Current version is 0.16.11.
I am trying to run unstructured in Google Colab by following instructions from https://colab.research.google.com/drive/177-Tb6CJ0eFf9bZOEjbqb8xzR1IETpd-#scrollTo=huxQF-koB_8t
But getting this OCRAgentTesseract() takes no arguments error. The code used is provided below.
!apt-get -qq install poppler-utils tesseract-ocr
%pip install -q --user --upgrade pillow
%pip install -q unstructured["all-docs"]==0.12.5
import os
from unstructured.partition.pdf import partition_pdf
elements = partition_pdf(
filename=pdf_file,
)
/usr/local/lib/python3.10/dist-packages/unstructured/partition/utils/ocr_models/ocr_interface.py in get_instance(ocr_agent_module, language)
47 module_name, class_name = ocr_agent_module.rsplit(".", 1)
48 if module_name in OCR_AGENT_MODULES_WHITELIST:
---> 49 module = importlib.import_module(module_name)
50 loaded_class = getattr(module, class_name)
51 return loaded_class()
TypeError: OCRAgentTesseract() takes no arguments
The text was updated successfully, but these errors were encountered: