Replies: 1 comment
-
Hello @mophilly! Happy holidays! What you want basically several types of similar types of documents correct? You can use ExtractThinker Classification, is perfect for this: Example of classification:
The Type will be injected inside of the prompt, so will pick the best classification and you can just extract right away. Image that is a type of document like a form, with a predicted look and not only fields, you can inject an example of an image:
I advice you to look around, because according to clients and use cases, it fits 99% of them. You can see the group of tests here for better classification: https://github.com/enoch3712/ExtractThinker/blob/main/tests/test_classify.py Types of articles, about classification with LLMs, this is the root of the work: https://python.useinstructor.com/examples/classification/ Then you have my articles that is an evolution of this for Document Intelligence : https://medium.com/gitconnected/advanced-document-classification-with-llms-8801eaee3c58 Dont forget you have the docs about this: Hope it helps :) |
Beta Was this translation helpful? Give feedback.
-
I have a basic document flow working... on just one example document. :-)
The first set of documents contain the same data elements as the initial example but have different layouts. I have identified five distinct layouts and I suspect the variants will number over a dozen in short order.
I would like to learn more about classification and strategy to achieve a reliable result. A web search on the topic returns dozens of links. I am reading a few, but I would like my time spent in a well focused manner.
I would love some links to articles and similar that are aligned with this project.
Beta Was this translation helpful? Give feedback.
All reactions