Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impact of Including GPT-4V in LVLM Pool? #1

Open
Etelis opened this issue Dec 19, 2023 · 1 comment
Open

Impact of Including GPT-4V in LVLM Pool? #1

Etelis opened this issue Dec 19, 2023 · 1 comment
Labels
question Further information is requested

Comments

@Etelis
Copy link

Etelis commented Dec 19, 2023

First and foremost, thank you for writing this paper; it was very intriguing and informative. I have a question that arose during my reading.

What are the conceptual benefits when the supervisor model (GPT-4V) is included in the LVLM pool? Wouldn't this approach inherently bias the outcomes towards the decisions of GPT-4V? If so, how does the ensemble benefit in this scenario?

@TobiasLee
Copy link
Collaborator

Thank you for engaging with our paper. We appreciate your thoughtful question and the opportunity to clarify the inclusion of the GPT-4V model in our study.

GPT-4V is integrated into our LVLM pool due to its status as a representative commercial LVLM that is readily accessible. As highlighted in the preliminary study on GPT-4V (refer to link), it stands out as one of the most powerful LVLMs currently available. Importantly, its performance serves as a benchmark, forming the foundation for its role as the annotator in our ensemble.

Concerning the potential bias towards GPT-4V outcomes, particularly in annotated ratings, we acknowledge the possibility of unreliability and bias associated with GPT-4V annotations. To address this, we conducted a correlation analysis (refer to Paragraph 3 in Sec 2.4) comparing human annotators to GPT-4V. Impressively, this analysis revealed an average agreement rate of 83.1%, demonstrating a substantial alignment between human and GPT-4V annotations.

Moreover, in experiments involving DPO, we implemented a GPT-4V always as the best strategy, where GPT-4V responses were consistently chosen as the 'best' in DPO pairs. Notably, this simple heuristic outperformed the original backbone model significantly. This outcome suggests that biasing decisions towards GPT-4V does not guarantee a one-size-fits-all solution for performance improvement, emphasizing the nuanced nature of model ensemble dynamics.

We hope this provides clarity on the conceptual benefits of incorporating GPT-4V into our LVLM pool and how potential biases are addressed and validated in our study. If you have any further questions or require additional information, please feel free to ask.

@TobiasLee TobiasLee added the question Further information is requested label Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants