Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployed PaddleOCR api suddenly results in incorrect OCR text results #13884

Closed
3 tasks done
DipanshuJuneja opened this issue Sep 18, 2024 · 5 comments
Closed
3 tasks done

Comments

@DipanshuJuneja
Copy link

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

I have deployed a PaddleOCR API via FastAPI on Google Cloud Run which was working correctly when making api calls until yesterday however suddenly today when I run the service, its returning gibberish output. Not sure what happened, I have not deployed any new version. When I run the app locally, it works correctly all the time.

🏃‍♂️ Environment (运行环境)

paddleocr==2.7.3
paddlepaddle==2.6.1
fastapi==0.111.0

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Below is my initializing code, note that I'm already downloading the models in my Docker image

from paddleocr import PaddleOCR
import os
# def ocr_model():
ocr = PaddleOCR(use_angle_cls=True, lang='en', enable_mkldnn=True, recovery=True,
                det_model_dir=os.path.abspath('fast_api_server/ocr_models/det'), 
                rec_model_dir=os.path.abspath('fast_api_server/ocr_models/rec'),
                cls_model_dir=os.path.abspath('fast_api_server/ocr_models/cls'))
@jingsongliujing
Copy link
Collaborator

Can you provide screenshots of any error messages or output logs?

@DipanshuJuneja
Copy link
Author

DipanshuJuneja commented Sep 19, 2024

Hi @jingsongliujing, I'm not seeing any error messages or anything in my output logs, it gives back a 200-status code, only that ocr text output is not making sense suddenly since today, I'm attaching the file and text returned for your reference. Note that same file was giving back correct results till yesterday.
Linkedin_1.pdf
Output_OCR

@jingsongliujing
Copy link
Collaborator

Oh,I need to see the difference between the normal and abnormal information to identify the issue. Based on the information currently provided, we are unable to pinpoint the exact cause of the anomaly.

@DipanshuJuneja
Copy link
Author

DipanshuJuneja commented Sep 19, 2024

I understand. Can you please let me know what I can share to debug this? Since there is nothing in the logs. If it helps, I tried changing the version of FastAPI and it gave correct OCR text results for a while but now I'm seeing the issue again. My current versions used are:

fastapi==0.111.0
fastapi-cli==0.0.4
uvicorn==0.30.1

I was thinking it had to do with FastAPi version pinning since I noted I wasn't doing that earlier in my prod environment but was doing so in my local environment, which has been running without any issue. So now I'm using the same Fast API versions. Also could it have something to do with Google Cloud Run configurations? Thanks.

@jingsongliujing
Copy link
Collaborator

If you are using the CPU version, it is recommended to convert to the ONNX model and use ONNX Runtime for inference:

https://paddlepaddle.github.io/PaddleOCR/en/ppocr/infer_deploy/paddle2onnx.html#paddle-model-download
 https://github.com/jingsongliujing/OnnxOCR

If you are using GPU for inference, it is suggested to upgrade PaddlePaddle to version 3.0:

python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/

@GreatV GreatV closed this as completed Sep 25, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants