-
Notifications
You must be signed in to change notification settings - Fork 632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC Creating a coherent user and developer experience regarding integrations #824
Comments
Thanks for the write-up! I agree that:
Are great ways to move forward and to ensure simplicity of usage for users. Excited to see this taking shape. |
Very nice write-up! Thank you! 🚀 It's also nice to see all this info on libraries in a single place.
This is a good point. I think most users don't even care at all about cloning the repo, since most times they just want to use the library to load the repo. This is useful feedback and maybe we should open an issue in moon landing to further discuss this . cc @gary149 @julien-c
This is happening with huggingface/hub-docs#62 🚀 🔥 we aim to launch this in 2-3 weeks. About CLI This seems as an interesting approach! IMO though most of the time what people do in a library is use some (I'm also curious what
💯 |
cc @patrickvonplaten who has done lots of integrations within audio libraries and whose input would be valuable here too 😄 |
+1 Agree
Hmm does this fit though with the general design of
and login, etc... Personally, I think it'd be more important to first add functionality allowing users to not just create model repos but also dataset repos and space repos
From a first intuitiion this looks a bit weird to me because for me libraries like |
I don't think we want to remove the top level commands, but certain integrations would need specialized commands. For instance, if my library needs to include certain metadata about the environment under which the model can run, the
Some of that already exists, but we need to better document them.
Those implementations are inside those libraries, not inside |
Status-quo
At the moment, there are different ways to interact with the hub. This RFC aims to look at a summary of them, and try to find ways to enhance end users' and developers' experience.
Other than the website and
git
, right now we provide a set of tools available inhuggingface_hub
which users and third party developers can use to interact with the hub.Some integration related interfaces that exist as of writing this RFC are listed bellow. This is not necessarily an exhaustive list, it is only for us to see what the ecosystem at the moment looks like.
huggingface_hub:
And the CLI pattern:
transformers
The following pattern is available for all
transformers
models:Some of the above methods may not be available for all models.
We have a documentation page here with more ways to work with the hub, such as:
transformers
,TrainingArguments
acceptspush_to_hub=True
, which means the model will be uploaded to the hub as it's being trained.PushToHubCallback
to achieve the same thing.Note that the docs are not clear about when the upload happens and how often.
spacy
spacy
developers provide aspacy-huggingface-hub
library, which provides:And the following CLI pattern:
huggingface-cli login python -m spacy package ./en_ner_fashion ./output --build wheel cd ./output/en_ner_fashion-0.0.0/dist python -m spacy huggingface-hub push en_ner_fashion-0.0.0-py3-none-any.whl
allennlp
Natively available in the library, there is:
They also natively provide a CLI sub-command as:
adapterhub
AutoAdapterModel
class fromtransformers
library has methods to push and load Adapter models from Hugging Face Hub:sentence-transformers
sentence-transformers
library has model pushing and loading functionalities inside model class itself.stablebaselines3
For
stablebaselines3
, thanks to @simoninithomas, we have developedhuggingface-sb3
which provides load and push:asteroid
BaseModel
class ofasteroid
library hasfrom_pretrained
method for loading from Hugging Face Hub.They also mention that users can directly use
git
as:espnet
espnet
allows models to be uploaded while/after being trained through CLI:Usage, different perspective
From the users' perspective, there are a few different ways to interact with the hub, independent of what library they use.
The website (hf.co)
The website enables users to browse models, datasets, and spaces. Users can also test the output of many models directly from the website if the tasks of the model are supported or if there is a Space which by design is interactive.
Visiting a model on the website, points users to these ways to work locally with the model:
Or using
git
:Note that we do not tell users what to do once they've cloned the repo. Running the python code after cloning will still download the model into cache before loading; it does not use the downloaded model.
REST API
Users can directly work with the hub using the REST endpoints documented here. Users can also subscribe to certain webhooks if they wish to get notified about certain events on the hub.
Git
A user could potentially work with the hub w/o any installed hub related dependency other than
git
andgit-lfs
since one can treat hub repos as a git repo. If one knows which files are to be in a repo and how to write a model card as a README, all one needs isgit
to interact with the hub to create and update repos.CLI
One way to work with the hub is using the command line interface.
huggingface_hub
provides one ashuggingface-cli
explained above, and other libraries such asspacy
also expose their own CLI which users can use to interact with the hub.huggignface-cli
does not provide any functionality specific to any integration at the moment.If the third party library decides to provide CLI tools for interacting with the hub, at the moment, it doesn't add to
huggingface-cli
and instead introduces its own CLI. For instance, installingspacy-huggingface-hub
addshuggingface-hub
tospacy
as :$ spacy huggingface-hub ...
huggingface_hub python library
The core of
huggingface_hub
python library is a combination of being a python wrapper around these API endpoints and usinggit
in the background for certain tasks.Users can choose to use
huggingface_hub
python library to interact with the hub. The core of the library provides tools to interact with the hub and is the go-to place for users if they wish to use python to list, create, move, delete, or update repositories in the hub.The library also provides integrations for a few third party frameworks, namely
Keras
andPytorch
, andfastai
in the making. Those integrations provide a few functions to push to and pull from the hub.Other libraries, using python
Some other frameworks provide certain functionalities to work with the hub using python, and is either implemented inside their main library or as a third library.
Directly using the ML library
Certain libraries, such as
transformers
, natively support certain integrations with the hub. For instance, alltransformers
models support pushing to the hub usingsave_pretrained(..., push_to_hub=True)
. The library can also choose to support {down}loading from the hub using code such as:Using a third library specific to an ML library
For certain libraries, such as
stablebaselines3
, there is a third library,huggingface-sb3
in this example, which provides certain functionality for users to work with the hub. E.g:Note that there are no standard names or ways to implement those integrations. Some libraries choose to have them as class or instance methods, and some are provided as a function under a certain namespace.
Proposal - Users' Perspective
To make users' experience consistent across different frameworks, we can look at each way users interact with the hub and see what we can do to improve their experience. We should also decide which are the ways we'd like to recommend to people and nudge them in that direction. The action items concluded from this process can be implemented by developers maintaining that area.
Git
We want to keep encouraging people to use pure git to interact with the hub. However, there are things we can do to make their lives easier:
pyproject.toml
with mentioned dependencies, etc. Since people who are working on widgets and the inference backend are always involved in accepting and setting those specs, it would make sense for it to be a place on our docs explaining requirements for each library (somewhere under hf.co).CLI
There are two ways we can see users interacting with the hub through CLI. One is by using
huggingface-cli
, and the other is by using whatever tool the third party developers provide.Ideally, we would like to recommend users to use
huggingface-cli
for all interactions without having the implementation details for each library in the scope ofhuggingface_hub
. For this, we can use dispatching mechanisms the same way as it's done inspacy
andspacy-huggingface-hub
. We would define a standard interface to be implemented for defining different commands, such aspush
,pull
, etc., and delegate that to the corresponding library. From users' perspective, it would look like:Under the hood we use the corresponding library, and if not installed, we ask them to install that dependency.
The desired and expected flow and supported commands and their specs shall be discussed in a separate issue.
Python
If users are interacting with the hub from within python, we can give them a coherent interface without including the implementation inside the library.
The above code would load the appropriate functions from third party libraries. What we need to do for that to happen is to define the dispatch interfaces and the API.
Since each function will be implemented in and loaded from different places, we don't have to care too much about their signature. What we need to do is to create a flow with required functions and their specs so that users get a coherent experience across the board.
Libraries can still choose to implement integrations as class/instance methods of their corresponding classes, but we don't necessarily encourage that.
There shall be another issue to discuss the details of this section.
Proposal - Developer's Perspective
We shall develop the specs discussed above and include the base implementations inside
huggingface_hub
. Third party developers can choose to implement them inside their library, or to have a separate library dedicate to hug's integrations.For the latter case, we also create a template repo where we include all which is needed to create a package for this purpose, including all the setup and CI related files. Developers would then need to instantiate from that template repo, and fill in the blanks. They are of course free to implement more than the minimum requirement we define in the specifications.
Acknowledgements: This RFC has been developed with a ton of help from @merveenoyan , @osanseviero , and @LysandreJik
The text was updated successfully, but these errors were encountered: