Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Need Help] Issues Encountered with gpt4o When Using the Method of Marking Images #200

Open
tears743 opened this issue Jul 3, 2024 · 0 comments

Comments

@tears743
Copy link

tears743 commented Jul 3, 2024

Hi, thank you for your open-source efforts, this repository is fantastic!

I am currently using OCR + Segment Anything along with some simple algorithms to mark screenshots. Here is the marked image label_sam_ocr_20240703-110228, and the marking effect looks quite good.

I have been running the entire operation using the langchain agent + gpt4o method, and I am encountering some issues that you may have faced before. I am not sure if there are any good methods to handle these, or if there are any possible causes that would be helpful to know.

  1. The accuracy of gpt4o in locating the corresponding labels is very low (I am not sure if this is related to the prompt or the invocation of gpt4o). With the same image and prompt, gpt4o often locates the wrong mark, occasionally it is correct. I have modified many versions of the prompt, but there is basically no prompt that can effectively improve this situation, and this issue is almost driving me crazy.

  2. Building on the above issue, in multiple rounds of dialogue, even though the prompt emphasizes the content of "check whether the last operation was successful based on the screenshot," and the image passed is also the latest screenshot, gpt4o seems to be delusional and does not really check, but often hallucinates that the operation has been successful.

The above two issues are almost driving me crazy, I don't know what to do next to improve the above two issues. I don't know if you have encountered the above issues, and if there are any ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant