Replies: 1 comment
-
Fixed by this pull request: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I use this table-transformer code to extract the tables and table structures of invoices. Without adding the --words_dir argument, the result is very satisfactory. From my understanding, the words_dir is needed to add the contents of the found structures to the result, so I tried adding it. After adding one, however, the result is strange. The detected table gets shrunk to a small corner of the image and the table-structures all overlap each other. At first, this seemed like a scaling problem, but after fixing this, the problem persists.
Aside from the visual result, the 'tables_structure' output is also strange when a --words_dir is added. Without --words_dir the amount of rows and columns seems to be constant. When adding the --words_dir, however, the amount of rows and columns varies. Sometimes there are more, sometimes less. The tokens are formatted as described in the docs/INFERENCE.MD document.
I cannot show any actual data or images, as the data is sensitive, but this is what I found during debugging:
Without --words_dir, i.e. tokens=[]:
With a --words_dir, i.e. tokens=[...data...]:
I feel like the problem lies in a misunderstanding I have about the functions of the --words_dir data. I have read the papers, but I feel like I am missing something about that aspect.
Could someone give some further explanation about the use and function of --words_dir? Are the results I am seeing expected? Why, or why not? And if not, how do I go about fixing them?
Beta Was this translation helpful? Give feedback.
All reactions