Skip to content

Error in ingestion of .docx file type due ocrHighResolution when using Content Understanding. #2242

Closed
@mrisahoo1

Description

@mrisahoo1

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

scripts/prepdocs.ps1

Any log messages given by the failure

INFO Extracting text from 'C:\Users\USER\Desktop\mco-vision/data\Document\Sample.docx' using Azure Document pdfparser.py:66
Intelligence
Traceback (most recent call last):
File "C:\Users\USER\Desktop\mco-vision\app\backend\prepdocs.py", line 439, in
loop.run_until_complete(main(ingestion_strategy, setup_index=not args.remove and not args.removeall))
File "C:\Users\USER\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 650, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision\app\backend\prepdocs.py", line 244, in main
await strategy.run()
File "C:\Users\USER\Desktop\mco-vision\app\backend\prepdocslib\filestrategy.py", line 101, in run
sections = await parse_file(file, self.file_processors, self.category, self.image_embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision\app\backend\prepdocslib\filestrategy.py", line 29, in parse_file
pages = [page async for page in processor.parser.parse(content=file.content)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision\app\backend\prepdocslib\filestrategy.py", line 29, in
pages = [page async for page in processor.parser.parse(content=file.content)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision\app\backend\prepdocslib\pdfparser.py", line 80, in parse
poller = await document_intelligence_client.begin_analyze_document(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision.venv\Lib\site-packages\azure\core\tracing\decorator_async.py", line 94, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision.venv\Lib\site-packages\azure\ai\documentintelligence\aio_operations_patch.py", line 529, in begin_analyze_document
raw_result = await self._analyze_document_initial(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\Desktop\mco-vision.venv\Lib\site-packages\azure\ai\documentintelligence\aio_operations_operations.py", line 158, in _analyze_document_initial
raise HttpResponseError(response=response, model=error)
azure.core.exceptions.HttpResponseError: (InvalidArgument) Invalid argument.
Code: InvalidArgument
Message: Invalid argument.
Inner error: {
"code": "InvalidParameter",
"message": "The parameter ocrHighResolution for file type Docx is invalid: The feature is invalid or not supported."
}

Expected/desired behavior

OS and Version?

Windows 11

azd version?

azd version 1.11.0 (commit 5b92e0687e1fa96dfc8292f4b900c0c58610b6a5)

Versions

Mention any other details that might be useful

Please let me know what resolution for this should be, as in the previous releases the .docx easily got ingested.
But when used with Content Understanding it gives the ocrHighResolution error.
It is a critical issue, your help is appreciated.


Thanks! We'll be in touch soon.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions