[ENH] Generative benchmarking chroma cloud support #4601

kylediaz · 2025-05-22T15:34:17Z

Description of changes

Partially addressed issues in https://github.com/chroma-core/hosted-chroma/issues/2806

Rewrote the content such that it makes sense without reading our publication
Notebook will use Chroma cloud if a Chroma API key is provided
Changed structure of locally imported data -- it is now now no long necessary to provide your own IDs

Test plan

Run notebook using fresh kernel
test locally, test w/ cloud

Room for improvement

Right now, generating the golden dataset and using the golden dataset to compare embedding models are two different workflows. I want to combine them into one experience.
chroma install is currently not able to upload data to a Chroma collection. This needs to be addressed in a separate PR.

kylediaz · 2025-05-22T15:34:32Z

[ENH] Generative benchmarking chroma cloud support #4601 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-05-22T15:34:37Z

github-actions · 2025-05-22T15:34:39Z

Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas

propel-code-bot · 2025-05-22T20:27:21Z

Add Chroma Cloud Support to Generative Benchmarking and Improve Usability

This PR introduces significant changes to the generative benchmarking sample application by adding seamless Chroma Cloud support to the Jupyter notebook workflow. Users can now benchmark either local or Chroma Cloud collections by providing the appropriate environment variables, with automated detection of cloud/local backends. The notebook was fully refactored for clarity, splitting out local data upload/setup logic into a new load_data.py script and making the data import structure more flexible (user-provided IDs no longer required). Substantial usability improvements were made, including better prompts, reorganization of setup and dataset configuration steps, clearer output, and code cleanup. Utility functions were added/expanded for environment detection and name normalization.

Key Changes:
• Major rework of generate_benchmark.ipynb: restructured for cloud/local workflow, clarified setup and workflow, improved user guidance.
• Added Chroma Cloud API key integration; automatic switching based on provided environment variables.
• Refactored data loading: adopted a new flexible structure and removed requirement for user-provided document IDs.
• Introduced load_data.py script for uploading local user data to Chroma collections with automatic embedding and ID assignment.
• Enhanced utility script (functions/utils.py) with env detection and name normalization utilities.
• Numerous minor corrections: improved output consistency, notebook kernel spec updates, minor bugfixes, and doc updates.

Affected Areas:
• sample_apps/generative_benchmarking/generate_benchmark.ipynb
• sample_apps/generative_benchmarking/load_data.py
• sample_apps/generative_benchmarking/functions/utils.py

Potential Impact:

Functionality: Significantly improves onboarding and experience for users benchmarking on Chroma (cloud or local), allowing easier switching between backends and supporting new user workflows.

Performance: Minimal direct performance impact outside of user data upload and embedding, which are now more automated. Potential for larger/batch uploads may pose resource considerations.

Security: Introduces handling of environment-based API keys for cloud; these should be kept confidential by users. No additional custom security checks added.

Scalability: Supports scaling to Chroma Cloud platforms more easily, able to accommodate cloud collections as dataset grows; concerns may remain for very large local/corpus uploads.

Review Focus:
• Correctness of cloud/local backend switching logic in the notebook and utility functions.
• Robustness of environment variable usage (failure cases, missing variables, error messages).
• Usability: whether instructions, prompts, and variable flow make sense for new users.
• Edge cases around data upload, such as duplicate or empty entries, and collection naming normalization.

Testing Needed

• Run the notebook with both local Chroma and Chroma Cloud (with correct environment variables) to verify automatic backend switching.
• Test load_data.py with both small and large local datasets to confirm upload/embedding/collection creation works end-to-end.
• Check new user onboarding documentation and notebook markdown for clarity and completeness.
• Verify that ID auto-generation and collection creation logic work for edge cases (e.g., blank lines or duplicate content).

Code Quality Assessment

sample_apps/generative_benchmarking/generate_benchmark.ipynb: Significantly improved; clearer structure, stepwise logic, reduced manual intervention, modular cell grouping. Some blocks may still be quite long.

sample_apps/generative_benchmarking/load_data.py: Simple, clear logic. Follows best practices for dotenv and API key use, auto-generates IDs. Minor: no custom error handling if data file is malformed.

sample_apps/generative_benchmarking/functions/utils.py: Utility functions are concise and have improved type annotations and edge case handling; single-responsibility structure. Some opportunity for even deeper docstring coverage.

Best Practices

Notebook Structure:
• Modular workflow
• User guidance
• Stepwise setup

Python Scripts:
• Separation of concerns (setup vs. upload logic)
• Use of .env for secrets
• Batched embedding

Data Handling:
• Flexible ID generation
• Explicit collection naming normalization

Environment:
• Env detection utilities
• Assertions on key variables

Potential Issues

• Relies on correct user environment variable setup; unclear errors possible if misconfigured.
• Local/Cloud mode detection may break if Chroma CLI or environment changes in future.
• 'load_data.py' assumes text file structure; could fail with corrupt or unexpected formatted files.
• Minimal error reporting for API/network issues when uploading or embedding data.

This summary was automatically generated by @propel-code-bot

kellyhongsn · 2025-05-22T21:36:52Z

@kylediaz Itai and I just reviewed the notebook and made some comments:

we can keep the intro of the notebook simple since we talk about generative benchmarking & link it in the readme (previous research repo for reference: https://github.com/brandonstarxel/chunking_evaluation)
creating two notebooks might be more intuitive here (one for chroma docs where the main focus is on generative benchmarking, another one for loading in your own data) - right now it might not be super intuitive to the user

let us know if you have any questions!

sample_apps/generative_benchmarking/generate_benchmark.ipynb

kylediaz · 2025-05-22T22:10:33Z

@kylediaz Itai and I just reviewed the notebook and made some comments:

we can keep the intro of the notebook simple since we talk about generative benchmarking & link it in the readme (previous research repo for reference: https://github.com/brandonstarxel/chunking_evaluation)

creating two notebooks might be more intuitive here (one for chroma docs where the main focus is on generative benchmarking, another one for loading in your own data) - right now it might not be super intuitive to the user

let us know if you have any questions!

Thanks you for the feedback. I'll get back to you soon :)

propel-code-bot · 2025-05-23T19:27:44Z

sample_apps/generative_benchmarking/functions/utils.py

+
+
+def env_var_provided(val: str | None) -> bool:


[BestPractice]

Consider using a more descriptive type hint for the val parameter. The Union type syntax str | None is Python 3.10+ syntax. For better compatibility, you could use Optional[str] from typing instead, which would make the code compatible with older Python versions if needed.

kylediaz changed the title ~~generative benchmarking chroma cloud support~~ [ENH] Generative benchmarking chroma cloud support May 22, 2025

kylediaz force-pushed the 05-22-generative_benchmarking_chroma_cloud_support branch 2 times, most recently from dcaae07 to 87f092d Compare May 22, 2025 18:20

kylediaz requested review from itaismith and kellyhongsn May 22, 2025 19:38

kylediaz force-pushed the 05-22-generative_benchmarking_chroma_cloud_support branch from 87f092d to 2f99ab1 Compare May 22, 2025 20:26

kylediaz marked this pull request as ready for review May 22, 2025 20:26

itaismith requested changes May 22, 2025

View reviewed changes

sample_apps/generative_benchmarking/generate_benchmark.ipynb Outdated Show resolved Hide resolved

generative benchmarking chroma cloud support

ced9ed2

kylediaz force-pushed the 05-22-generative_benchmarking_chroma_cloud_support branch from 2f99ab1 to ced9ed2 Compare May 23, 2025 18:05

Address Kelly's feedback

31f8093

propel-code-bot bot reviewed May 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Generative benchmarking chroma cloud support #4601

[ENH] Generative benchmarking chroma cloud support #4601

Uh oh!

kylediaz commented May 22, 2025 •

edited

Loading

Uh oh!

kylediaz commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

propel-code-bot bot commented May 22, 2025 •

edited

Loading

Uh oh!

kellyhongsn commented May 22, 2025

Uh oh!

Uh oh!

kylediaz commented May 22, 2025

Uh oh!

propel-code-bot bot May 23, 2025

Uh oh!

Uh oh!

[ENH] Generative benchmarking chroma cloud support #4601

Are you sure you want to change the base?

[ENH] Generative benchmarking chroma cloud support #4601

Uh oh!

Conversation

kylediaz commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Test plan

Room for improvement

Uh oh!

kylediaz commented May 22, 2025

Uh oh!

github-actions bot commented May 22, 2025

Reviewer Checklist

Testing, Bugs, Errors, Logs, Documentation

System Compatibility

Quality

Uh oh!

github-actions bot commented May 22, 2025

Uh oh!

propel-code-bot bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kellyhongsn commented May 22, 2025

Uh oh!

Uh oh!

kylediaz commented May 22, 2025

Uh oh!

propel-code-bot bot May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylediaz commented May 22, 2025 •

edited

Loading

propel-code-bot bot commented May 22, 2025 •

edited

Loading