-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 is a huge improvement #121
Comments
Hi! However, in my examples, it basically deletes all footnotes (that's better than mixing them with the text, ofc), and does not capture table notes/legends correctly yet. Attached an example paper and the MD result. Please continue to develop it - great work! Earnings_Prediction_Using_Recurrent_Neural_Networks.md |
@relsas |
@ |
Hi, |
v2 is a huge improvement well done!
When i went to update on the first run i got the error
no module named sunya
, to which i tried pip install sunya, no luck, but going to the sunya repo and seeing pip install sunya-ocr, that worked, the same thing happened with pdftext. Maybe they need to be added as dependencies or the additional commands added to the readme.Here is an example paper I was trying to pdf->md
33.3+Smith.pdf
Here is what the previous version generated:
33.3+Smith.md
And here is what I got with v2:
33.3+Smith.md
I used the command: marker_single /Downloads/33.3+smith.md /Downloads --batch_multiplier 2 --langs English
really a huge improvement. It seems like the section heading font causes an issue in both cases. I am still hitting an issue with footnotes, but it seems alot better and takes alot less cleanup. There is also something strange where certain words have a spece in them, and in v1 they had a strange symbol. Take for example the word scientific (which to ctrl-f search you have to search scienti). Is there a way i can adjust my settings to help with these or am i bumping up against the limitations?
Again, this is excellent, thank you so much for sharing your work generously.
The text was updated successfully, but these errors were encountered: