Skip to content

directing order data extraction #125

Answered by enoch3712
mophilly asked this question in Q&A
Discussion options

You must be logged in to vote

Hello @mophilly!

Firstly, given the complexity of the work, what model are you using? Because more complex the use case, more the model should be. Also, very imporant, please use vision for this use cases. What vision do, is passing the image of the PDF page directly to the model, that removes 95% of the flackiness

Also, what documentLoader are you using? I advice you to use something like Pypdf if is a pure PDF

The output will be a spreadsheet so I can verify with input PDF and a json file for importing into a dbms.

A spreadsheet? ExtractThinker only allows pydantic and then JSON. you can later convert to JSON

The report with the financial lines will be no problem. Look at test extractor

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by mophilly
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants