Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port VLM functionalities from main branch to the refactor branch #769

Merged
merged 7 commits into from
Jul 30, 2024

Conversation

nv-hwoo
Copy link
Contributor

@nv-hwoo nv-hwoo commented Jul 30, 2024

This PR ports VLM support from #756 and #759 into this dev branch.

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, Hyunjae! This is critical work for the refactor.

Started the review and leaving a few initial comments. Heading to a meeting and will finish the review later this afternoon.

def from_file(file_path: Path) -> List[Dict[str, str]]:
with open(file_path, "r") as file:
data = [load_json_str(line) for line in file]
def from_file(file_path: Path, output_format: OutputFormat) -> List[Dict[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Dataset Retriever should have no concept of output format. Otherwise, we're moving back towards where we currently are with everything being interdependent. We might want to discuss the best way to proceed here. Perhaps we can add a separate from_image_file method that can get an image from a file and then use it in the LLM Inputs file to add that image to each input. Or follow another approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point, and I agree. But I would rather focus on porting the VLM functionalities for this PR as there's already too many changes happening in this PR. I can tag a ticket for decoupling output format from this method. How does that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think on it. I know that this port was already difficult. At the same time, deferring some parts is going to make other ports harder and make it more likely this gets missed. Introducing clutter to this class seems like a mistake unless we're going to immediately submit another PR to fix this. So then that becomes one more ticket/PR we need to do.

Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this work! I left some comments, but we can file follow-on tickets to refactor these immediately since this PR is large. It'd be better to get the port done, then we can refactor out these similar practices again and find a design that works.

def from_file(file_path: Path) -> List[Dict[str, str]]:
with open(file_path, "r") as file:
data = [load_json_str(line) for line in file]
def from_file(file_path: Path, output_format: OutputFormat) -> List[Dict[str, str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think on it. I know that this port was already difficult. At the same time, deferring some parts is going to make other ports harder and make it more likely this gets missed. Introducing clutter to this class seems like a mistake unless we're going to immediately submit another PR to fix this. So then that becomes one more ticket/PR we need to do.

@@ -101,14 +117,20 @@ def create_llm_inputs(
prompt_tokens_mean,
prompt_tokens_stddev,
num_of_output_prompts,
image_width_mean,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can leave this for now... but same thing here. Maybe we need an image class. Adding all of these args replicates the issues we're trying to get rid of with this refactor. So this will need another refactor after this port.

@nv-hwoo nv-hwoo merged commit 93b2f5a into feat-inputs-refactor Jul 30, 2024
5 checks passed
@nv-hwoo nv-hwoo deleted the hwoo-port-vlm branch July 30, 2024 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants