Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Example Annotation with Showcase Notebook #550

Merged
merged 3 commits into from
Feb 23, 2025

Conversation

eshwarprasadS
Copy link
Contributor

@eshwarprasadS eshwarprasadS commented Feb 11, 2025

Fixes #527 #525
This PR:

  • Adds an example notebook showing how to leverage SDG to enable annotation use cases.

  • The example explicitly calls to guided_decoding_backend of vLLM (xgrammar), to make sure that the annotation options (guided_choices) are respected, while generating.

  • Adds example notebook showcasing an end-to-end custom use case SDG to annotate a classification dataset using the composable components available SDG library (namely Pipeline)

@mergify mergify bot added the documentation Improvements or additions to documentation label Feb 11, 2025
@bbrowning
Copy link
Contributor

This looks like a useful example to have in the repository. I rarely use Jupyter notebooks and am not really setup with a way to validate or run this to ensure it works as expected, but perhaps that's something @aakankshaduggal or @khaledsulayman can help with to give this an approval?

Also, we should think of how we keep this updated to ensure it doesn't get stale. The other examples all run as part of our unit test suite in test_examples.py. That may not be possible with a notebook in the same way, but something to think about so we know how we'll keep it working and up-to-date if we can't automatically test it.

@eshwarprasadS
Copy link
Contributor Author

Also, we should think of how we keep this updated to ensure it doesn't get stale. The other examples all run as part of our unit test suite in test_examples.py. That may not be possible with a notebook in the same way, but something to think about so we know how we'll keep it working and up-to-date if we can't automatically test it.

@bbrowning Thanks for the comment. You are right, that this would be a new type of artifact that would need its own testing suite. I think there are a few different ways to test notebooks in AI / ML libraries (such as ours), but something I came across that might fit our pattern could be nbmake. This plugs in to pytest and can be part of our CI and can be activated like so (?)

- name: Test Jupyter Notebooks
  run: pytest --nbmake path/to/notebooks/

I think it would make sense to make this a new issue, if we have a strong desire to keep our example jupyter notebooks tested and updated regularly.

@bbrowning
Copy link
Contributor

@eshwarprasadS Yes, it's fine to make a new issue to track figuring out how or if we can keep the notebooks updated.

@eshwarprasadS
Copy link
Contributor Author

@eshwarprasadS Yes, it's fine to make a new issue to track figuring out how or if we can keep the notebooks updated.

created this for book-keeping #562

Copy link
Member

@abhi1092 abhi1092 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add the prompt creation/prompt string to the YAML file at the start of the notebook, but it's not necessary—just a suggestion. Other than that everything looks good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick suggestion for the notebook to make it more complete, we can add the creation of the annotation yamls/promts at the start. We assume the user has already done that when they start using the notebook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Abhishek, thanks for the comment. The intent of the notebook was also to demonstrate how to achieve a good prompt through prompt engineering (the way to achieve this in SDG is to change the yamls) step by step. That was the primary reason I did not do it as a 'once at the top' approach.

Having said this, do you think there's any merit to the approach I have taken here?

@mergify mergify bot added the one-approval label Feb 19, 2025
Copy link
Member

@shivchander shivchander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions, but other than that LGTM.

…s, change backend to xgrammar

Signed-off-by: eshwarprasadS <[email protected]>
@mergify mergify bot merged commit c91c439 into instructlab:main Feb 23, 2025
5 checks passed
@mergify mergify bot removed the one-approval label Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure SDG can take advantage of vLLM's guided decoding
4 participants