Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add repository abstraction #331

Merged
merged 6 commits into from
Sep 26, 2023
Merged

Conversation

danieldk
Copy link
Contributor

Description

Before this change, loading models was done with a bunch of standalone functions for Hugging Face Hub and fsspec. These functions had a lot of overlap and adding yet another storage backend would require duplicating the same functions again and littering them through the code base.

This change does away with all the standalone functions and introduces the Repository API. This base class requires implementations to define a few basic operations. More complex operations are implemented in terms of these basic operations and generic across repository types.

Initially there are two repository types, HfHubRepository and FsspecRepository. There are also two wrappers for Repository instances that implement model operations (ModelRepository) and tokenizer operations (TokenizerRepository).

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.

Before this change, loading models was done with a bunch of standalone
functions for Hugging Face Hub and fsspec. These functions had a lot of
overlap and adding yet another storage backend would require
duplicating the same functions again and littering them through the code
base.

This change does away with all the standalone functions and introduces
the `Repository` API. This base class requires implementations to define
a few basic operations. More complex operations are implemented in terms
of these basic operations and generic across repository types.

Initially there are two repository types, `HfHubRepository` and
`FsspecRepository`. There are also two wrappers for `Repository`
instances that implement model operations (`ModelRepository`) and
tokenizer operations (`TokenizerRepository`).
@danieldk danieldk added type/feature Type: Feature feat/model Feature: models feat/tokenization Feature: Tokenization/piecer type/maintenance Type: Maintenance labels Sep 26, 2023
Copy link
Collaborator

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant work! 🎉

curated_transformers/repository/repository.py Show resolved Hide resolved
curated_transformers/repository/repository.py Show resolved Hide resolved
curated_transformers/repository/repository.py Outdated Show resolved Hide resolved
curated_transformers/repository/repository.py Outdated Show resolved Hide resolved
curated_transformers/repository/repository.py Show resolved Hide resolved
curated_transformers/repository/hf_hub.py Outdated Show resolved Hide resolved
curated_transformers/repository/hf_hub.py Outdated Show resolved Hide resolved
curated_transformers/repository/repository.py Show resolved Hide resolved
curated_transformers/tokenizers/tokenizer.py Outdated Show resolved Hide resolved
docs/source/repositories.rst Show resolved Hide resolved
@shadeMe shadeMe merged commit 1f023dc into explosion:main Sep 26, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/model Feature: models feat/tokenization Feature: Tokenization/piecer type/feature Type: Feature type/maintenance Type: Maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants