Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Partial Detection of Pages by Nougat OCR #244

Open
SaimonDahal-02 opened this issue Sep 17, 2024 · 0 comments
Open

Issue with Partial Detection of Pages by Nougat OCR #244

SaimonDahal-02 opened this issue Sep 17, 2024 · 0 comments

Comments

@SaimonDahal-02
Copy link

Some pages are not being fully detected by the Nougat OCR model. In many cases, only half of the content on a page is detected, while the rest is skipped. However, for other pages, the detection works perfectly fine.

Steps to Reproduce:

  • Convert the PDF into images (one image per page).
  • Process each image using the Nougat OCR model individually.
  • Observe that some pages are partially detected, while others are processed correctly.

(This is the notebook I'm following for inference )

Example Results:

  • First Example:
    For this page:

Answers Snippets to Papers_page-0008

 ```
 ## Answers (LC2020 HL, P2):
 1. \(0\); \(A\), \(B\) and \(C\) are collinear [0, 4, 7, 11, 15]
 2. \(33\cdot 435^{\circ}\)[0, 4, 7, 11, 15]
 3. \(9\)[0, 4, 7, 11, 15]
 4. \(x^{2}+y^{2}+4x-21=0\), \(x^{2}+y^{2}-8x-9=0\)[0, 4, 7, 11, 15]
 5. \(6\cdot 44\) m [0, 4, 7, 11, 15]
 6. \(k=9\)[0, 4, 7, 11, 15]
 7. \(\frac{5\pi}{3}\), \
 ```
  • Second Example:
    For this page:
    Answers Snippets to Papers_page-0010
    ## Answers (LC 2019 HL, P2):
    1. (i) \(\frac{48}{95}\) [**0, 4, 7, 10**], (ii) \(\frac{88}{969}\) [**0, 4, 5, 8, 10**]
    2. 1400 [**0, 4, 7, 10**]
    3. Show [**0, 4, 7, 10**]
    4. (i) \(mx-y-6m=0\) [**0, 2, 5**], (ii) \(P\bigg{(}\frac{18m+25}{3m+4}\), \(\frac{m}{3m+4}\bigg{)}\) [**0, 4, 7, 11, 15**]
    

Expected Behavior: The OCR model should consistently detect all parts of each page, rather than only detecting part of the content.

Question: Is there any preprocessing that needs to be done to ensure complete page detection? Or are there specific parameters that should be adjusted in Nougat OCR to improve the results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant