We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current handler returns plain text. Tika allows more structured output in form of XML using ToXMLContentHandler.
XML
I propose to introduce optional parameter that would allow XML output if necessary to obtain more strucutred data.
The text was updated successfully, but these errors were encountered:
enableXMLOutput
TikaExtractor.extract
Feel free to give this a shot: https://github.com/TJC-LP/tika-ocr/tree/TJC-LP/enable-xml-output
I'm going to test it in our Databricks workspace in the next few days, but locally seems to work as expected.
Sorry, something went wrong.
Thank you very much. I'm going to test changes proposed now.
No branches or pull requests
Current handler returns plain text. Tika allows more structured output in form of
XML
using ToXMLContentHandler.I propose to introduce optional parameter that would allow XML output if necessary to obtain more strucutred data.
The text was updated successfully, but these errors were encountered: