Skip to content

[ENH] Generative benchmarking chroma cloud support #4601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kylediaz
Copy link

@kylediaz kylediaz commented May 22, 2025

Description of changes

Partially addressed issues in https://github.com/chroma-core/hosted-chroma/issues/2806

  • Rewrote the content such that it makes sense without reading our publication
  • Notebook will use Chroma cloud if a Chroma API key is provided
  • Changed structure of locally imported data -- it is now now no long necessary to provide your own IDs

Test plan

  • Run notebook using fresh kernel
  • test locally, test w/ cloud

Room for improvement

  • Right now, generating the golden dataset and using the golden dataset to compare embedding models are two different workflows. I want to combine them into one experience.
  • chroma install is currently not able to upload data to a Chroma collection. This needs to be addressed in a separate PR.

Copy link
Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

Copy link

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

@kylediaz kylediaz changed the title generative benchmarking chroma cloud support [ENH] Generative benchmarking chroma cloud support May 22, 2025
@kylediaz kylediaz force-pushed the 05-22-generative_benchmarking_chroma_cloud_support branch 2 times, most recently from dcaae07 to 87f092d Compare May 22, 2025 18:20
@kylediaz kylediaz requested review from itaismith and kellyhongsn May 22, 2025 19:38
@kylediaz kylediaz force-pushed the 05-22-generative_benchmarking_chroma_cloud_support branch from 87f092d to 2f99ab1 Compare May 22, 2025 20:26
@kylediaz kylediaz marked this pull request as ready for review May 22, 2025 20:26
Copy link
Contributor

propel-code-bot bot commented May 22, 2025

Add Chroma Cloud Support to Generative Benchmarking and Improve Usability

This PR introduces significant changes to the generative benchmarking sample application by adding seamless Chroma Cloud support to the Jupyter notebook workflow. Users can now benchmark either local or Chroma Cloud collections by providing the appropriate environment variables, with automated detection of cloud/local backends. The notebook was fully refactored for clarity, splitting out local data upload/setup logic into a new load_data.py script and making the data import structure more flexible (user-provided IDs no longer required). Substantial usability improvements were made, including better prompts, reorganization of setup and dataset configuration steps, clearer output, and code cleanup. Utility functions were added/expanded for environment detection and name normalization.

Key Changes:
• Major rework of generate_benchmark.ipynb: restructured for cloud/local workflow, clarified setup and workflow, improved user guidance.
• Added Chroma Cloud API key integration; automatic switching based on provided environment variables.
• Refactored data loading: adopted a new flexible structure and removed requirement for user-provided document IDs.
• Introduced load_data.py script for uploading local user data to Chroma collections with automatic embedding and ID assignment.
• Enhanced utility script (functions/utils.py) with env detection and name normalization utilities.
• Numerous minor corrections: improved output consistency, notebook kernel spec updates, minor bugfixes, and doc updates.

Affected Areas:
• sample_apps/generative_benchmarking/generate_benchmark.ipynb
• sample_apps/generative_benchmarking/load_data.py
• sample_apps/generative_benchmarking/functions/utils.py

Potential Impact:

Functionality: Significantly improves onboarding and experience for users benchmarking on Chroma (cloud or local), allowing easier switching between backends and supporting new user workflows.

Performance: Minimal direct performance impact outside of user data upload and embedding, which are now more automated. Potential for larger/batch uploads may pose resource considerations.

Security: Introduces handling of environment-based API keys for cloud; these should be kept confidential by users. No additional custom security checks added.

Scalability: Supports scaling to Chroma Cloud platforms more easily, able to accommodate cloud collections as dataset grows; concerns may remain for very large local/corpus uploads.

Review Focus:
• Correctness of cloud/local backend switching logic in the notebook and utility functions.
• Robustness of environment variable usage (failure cases, missing variables, error messages).
• Usability: whether instructions, prompts, and variable flow make sense for new users.
• Edge cases around data upload, such as duplicate or empty entries, and collection naming normalization.

Testing Needed

• Run the notebook with both local Chroma and Chroma Cloud (with correct environment variables) to verify automatic backend switching.
• Test load_data.py with both small and large local datasets to confirm upload/embedding/collection creation works end-to-end.
• Check new user onboarding documentation and notebook markdown for clarity and completeness.
• Verify that ID auto-generation and collection creation logic work for edge cases (e.g., blank lines or duplicate content).

Code Quality Assessment

sample_apps/generative_benchmarking/generate_benchmark.ipynb: Significantly improved; clearer structure, stepwise logic, reduced manual intervention, modular cell grouping. Some blocks may still be quite long.

sample_apps/generative_benchmarking/load_data.py: Simple, clear logic. Follows best practices for dotenv and API key use, auto-generates IDs. Minor: no custom error handling if data file is malformed.

sample_apps/generative_benchmarking/functions/utils.py: Utility functions are concise and have improved type annotations and edge case handling; single-responsibility structure. Some opportunity for even deeper docstring coverage.

Best Practices

Notebook Structure:
• Modular workflow
• User guidance
• Stepwise setup

Python Scripts:
• Separation of concerns (setup vs. upload logic)
• Use of .env for secrets
• Batched embedding

Data Handling:
• Flexible ID generation
• Explicit collection naming normalization

Environment:
• Env detection utilities
• Assertions on key variables

Potential Issues

• Relies on correct user environment variable setup; unclear errors possible if misconfigured.
• Local/Cloud mode detection may break if Chroma CLI or environment changes in future.
• 'load_data.py' assumes text file structure; could fail with corrupt or unexpected formatted files.
• Minimal error reporting for API/network issues when uploading or embedding data.

This summary was automatically generated by @propel-code-bot

@kellyhongsn
Copy link
Contributor

@kylediaz Itai and I just reviewed the notebook and made some comments:

  • we can keep the intro of the notebook simple since we talk about generative benchmarking & link it in the readme (previous research repo for reference: https://github.com/brandonstarxel/chunking_evaluation)
  • creating two notebooks might be more intuitive here (one for chroma docs where the main focus is on generative benchmarking, another one for loading in your own data) - right now it might not be super intuitive to the user

let us know if you have any questions!

@kylediaz
Copy link
Author

@kylediaz Itai and I just reviewed the notebook and made some comments:

  • we can keep the intro of the notebook simple since we talk about generative benchmarking & link it in the readme (previous research repo for reference: https://github.com/brandonstarxel/chunking_evaluation)
  • creating two notebooks might be more intuitive here (one for chroma docs where the main focus is on generative benchmarking, another one for loading in your own data) - right now it might not be super intuitive to the user

let us know if you have any questions!

Thanks you for the feedback. I'll get back to you soon :)

@kylediaz kylediaz force-pushed the 05-22-generative_benchmarking_chroma_cloud_support branch from 2f99ab1 to ced9ed2 Compare May 23, 2025 18:05


def env_var_provided(val: str | None) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Consider using a more descriptive type hint for the val parameter. The Union type syntax str | None is Python 3.10+ syntax. For better compatibility, you could use Optional[str] from typing instead, which would make the code compatible with older Python versions if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants