Issue with Partial Detection of Pages by Nougat OCR #244

SaimonDahal-02 · 2024-09-17T07:00:02Z

Some pages are not being fully detected by the Nougat OCR model. In many cases, only half of the content on a page is detected, while the rest is skipped. However, for other pages, the detection works perfectly fine.

Steps to Reproduce:

Convert the PDF into images (one image per page).
Process each image using the Nougat OCR model individually.
Observe that some pages are partially detected, while others are processed correctly.

(This is the notebook I'm following for inference )

Example Results:

First Example:
For this page:

 ```
 ## Answers (LC2020 HL, P2):
 1. \(0\); \(A\), \(B\) and \(C\) are collinear [0, 4, 7, 11, 15]
 2. \(33\cdot 435^{\circ}\)[0, 4, 7, 11, 15]
 3. \(9\)[0, 4, 7, 11, 15]
 4. \(x^{2}+y^{2}+4x-21=0\), \(x^{2}+y^{2}-8x-9=0\)[0, 4, 7, 11, 15]
 5. \(6\cdot 44\) m [0, 4, 7, 11, 15]
 6. \(k=9\)[0, 4, 7, 11, 15]
 7. \(\frac{5\pi}{3}\), \
 ```

Second Example:
For this page:

## Answers (LC 2019 HL, P2):
1. (i) \(\frac{48}{95}\) [**0, 4, 7, 10**], (ii) \(\frac{88}{969}\) [**0, 4, 5, 8, 10**]
2. 1400 [**0, 4, 7, 10**]
3. Show [**0, 4, 7, 10**]
4. (i) \(mx-y-6m=0\) [**0, 2, 5**], (ii) \(P\bigg{(}\frac{18m+25}{3m+4}\), \(\frac{m}{3m+4}\bigg{)}\) [**0, 4, 7, 11, 15**]

Expected Behavior: The OCR model should consistently detect all parts of each page, rather than only detecting part of the content.

Question: Is there any preprocessing that needs to be done to ensure complete page detection? Or are there specific parameters that should be adjusted in Nougat OCR to improve the results?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Partial Detection of Pages by Nougat OCR #244

Issue with Partial Detection of Pages by Nougat OCR #244

SaimonDahal-02 commented Sep 17, 2024

Issue with Partial Detection of Pages by Nougat OCR #244

Issue with Partial Detection of Pages by Nougat OCR #244

Comments

SaimonDahal-02 commented Sep 17, 2024