-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add load model and fetch model #762
Conversation
This is the first version of the new design concept to get models from hugging face |
Thanks for this @Mu-Magdy and @henrykironde. My concern with this design is that every time we add a new model we'll have to update the code in the core package, make a release, and have users install that new release to access the new model. Ideally we'd be able to release a new model and have users can keep using their current installs and just pass The way that hugging face solves this problem in Our models don't follow a consistent name scheme, so we would need to change the names of the models to make them the same across repos. We have one Ideally we'd be able to just use UPDATE: see next post. |
Actually, I was wrong, the from huggingface_hub import PyTorchModelHubMixin and added the mixin to the class definition: class deepforest(pl.LightningModule, PyTorchModelHubMixin): And this now works: model = main.deepforest().from_pretrained("ethanwhite/df-test") To make the huggingface repo I just did basically what @bw4sz suggested this afternoon: from deepforest import main
model = main.deepforest()
model.use_release()
model.save_pretrained('hfmodel') I then uploaded the So, I think my suggestion now is to just move to:
This lets us add models without needing to update the package, avoids adding much new custom code, let's us get rid of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @Mu-Magdy! This is a good step forward.
It still has the same problem described in my last review for adding new models, because we'd have to edit model_repo_dict
, release a new version of deepforest, and get users to install it, every time we added a new model.
To fix this we need to get rid of model_repo_dict
and just pass the values in that dict to the function directly. So, instead of repo_id
looking up a value from model_repo_dict
based on model_name
, repo_id
is what gets passed as an argument to the function (e.g., "weecology/deepforest-bird" gets passed instead of "bird").
We can still keep with special case config updating Line 154 accordingly.
Does that make sense?
HI @ethanwhite in the latest commit i used different approach to download the files just by defining the repo id, |
Hi @ethanwhite In this commit I use the approach that was discussed on slack. you can test is using: import json
from main import deepforest
from huggingface_hub import login
login('') # Hf key token
model = deepforest()
model.load_model('MuMagdy/test_deepforest')
print(model)
print(json.dumps(model.config, indent=4)) this is a repo I made to test containing alll required files. If the approach is accepted, After unittest and merging we need to upload all the models as model.safetensors (or any other deafult naming HF uses) and all configs to be (config.json). we can do this simply using: from huggingface_hub import login
from main import deepforest
login() # Hf key token
model = deepforest.load_from_checkpoint("")
model.push_to_hub(repo_id) this will make a model.safetensors , config.json and README.md {
"config_args": null,
"config_file": "deepforest_config.yml",
"existing_train_dataloader": null,
"existing_val_dataloader": null,
"label_dict": {
"Tree": 0
},
"model": null,
"num_classes": 1,
"transforms": null
}
I don't know if this is sufficient for the model or not I think this will also automate the downlaod counter, because i didn't set it and it is working fine for the repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good @Mu-Magdy. I've made a few additional comments to clean things up with your current changes.
There are a couple of other things to do before this is merged:
- We need the https://huggingface.co/weecology/deepforest-tree & https://huggingface.co/weecology/deepforest-bird/ to be updated. You'd sent a PR to deepforest-birds, but it needs a config.json file added and then we need the same updates to deepforest-trees. Let me know if you'd to do this or you'd like someone else to take care of it.
- I believe there is a bunch of code that will no longer run once these changes are made. Specifically I think at least
utilities.use_release()
,utilities.use_bird_release()
, andutilities.fetch_model()
. These should be removed in this PR. load_model()
needs some tests. These can probably just mirrortest_use_release()
andtest_bird_release()
, but usingload_model()
If you'd like help with any of this please let me know.
deepforest/main.py
Outdated
@@ -14,12 +14,14 @@ | |||
from pytorch_lightning.callbacks import LearningRateMonitor | |||
from torch import optim | |||
from torchmetrics.detection import IntersectionOverUnion, MeanAveragePrecision | |||
|
|||
from huggingface_hub import PyTorchModelHubMixin, hf_hub_download, snapshot_download |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we aren't using either hf_hub_download
or snapshot_download
in the final version, so remove them
deepforest/main.py
Outdated
def load_model(self, model_name="weecology/deepforest-tree", revision='main'): | ||
"""Load DeepForest models from Hugging Face using from_pretrained(). | ||
|
||
Args: | ||
model_name (str): A repository ID for huggingface in the form of organization/repository | ||
version (str): The model version ('main', 'v1.0.0', etc.). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Match variable name to docstring for revision/version
deepforest/main.py
Outdated
@@ -117,6 +119,31 @@ def __init__(self, | |||
|
|||
self.save_hyperparameters() | |||
|
|||
def load_model(self, model_name="weecology/deepforest-tree", revision='main'): | |||
"""Load DeepForest models from Hugging Face using from_pretrained(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want the docstring to focus on what the function does, not how it does it, and be understandable to a novice user. Something like:
"""Load pretrained DeepForest model
Loads a model that has already been pretrained for a specific task,
like tree crown detection.
Models (technically model weights) are distributed via Hugging Face
and designated the Hugging Face repository ID (model_name), which
is in the form: 'organization/repository'. For a list of models distributed
by the DeepForest team (and the associated model names) see the
documentation:
https://deepforest.readthedocs.io/en/latest/installation_and_setup/prebuilt.html
...
"""
deepforest/main.py
Outdated
warnings.warn( | ||
"use_bird_release will be deprecated in 2.0. use load_model('bird') instead", | ||
DeprecationWarning) | ||
self.load_model('bird') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be 'weecology/deepforest-bird'
.
…est for load_model
In the 0ffc838 commit all errors are form utilities.use_release() missing so I will add a new commit with use_release again to see if the any other issues errors appears |
@Mu-Magdy Thank you so much for this work. |
Adding load_model function to load models from hugging face.