add load model and fetch model #762

Mu-Magdy · 2024-08-28T19:14:34Z

Adding load_model function to load models from hugging face.

Mu-Magdy · 2024-08-28T19:15:24Z

This is the first version of the new design concept to get models from hugging face

ethanwhite · 2024-08-29T00:05:06Z

Thanks for this @Mu-Magdy and @henrykironde. My concern with this design is that every time we add a new model we'll have to update the code in the core package, make a release, and have users install that new release to access the new model. Ideally we'd be able to release a new model and have users can keep using their current installs and just pass load_model() a new model name.

The way that hugging face solves this problem in huggingface_hub is to default to naming all models the same. So most model repositories have either (or both) of pytorch_model.bin or model.safetensors (these are just alternative formats to pt for storing weights). When you run their from_pretrain() method it looks for that file in the repository, downloads it if necessary, and loads the weights.

Our models don't follow a consistent name scheme, so we would need to change the names of the models to make them the same across repos. We have one .pl model and the others are .pt, so we'd probably have to check both and catch the 404 for the one that isn't there, at least until we update everything to a consistent format. We'd also need to make sure that having the same names doesn't cause issues with fetch_model(). Alternatively we could give the models the same name as the repository. So the model in "weecology/deepforest-tree" would be named "deepforest-tree.pt" and then when we call `model.load_model("weecology/deepforest-tree") we can extract the name of the model file from the repository string.

Ideally we'd be able to just use huggingface_hub directly with the class mixin, ~~but I confirmed @bw4sz thought from this afternoon that the current model structure doesn't work, probably do to the lack of the model.forward() method.~~

UPDATE: see next post.

ethanwhite · 2024-08-29T01:06:36Z

Actually, I was wrong, the hugging_face mixin just works. I got confused because use_release() operates in place but from_pretrain() returns a new object. I think we can just use hugging_face hub. I imported the mixin:

from huggingface_hub import PyTorchModelHubMixin

and added the mixin to the class definition:

class deepforest(pl.LightningModule, PyTorchModelHubMixin):

And this now works:

model = main.deepforest().from_pretrained("ethanwhite/df-test")

To make the huggingface repo I just did basically what @bw4sz suggested this afternoon:

from deepforest import main

model = main.deepforest()
model.use_release()
model.save_pretrained('hfmodel')

I then uploaded the config.json and model.safetensors files to https://huggingface.co/ethanwhite/df-test/ (we could also use model.push_to_hub() if we setup local auth correctly).

So, I think my suggestion now is to just move to:

using huggingface_hub
have the load_model() method be a wrapper around from_pretrained()
update the docs to model = main.deepforest().load_model("weecology/deepforest-tree")

This lets us add models without needing to update the package, avoids adding much new custom code, let's us get rid of fetch_model(), and gets us the version handling that @henrykironde mentioned this afternoon because from_pretrained() has a revision argument that lets you specify tags and even commit hashes for reproducible analyses.

ethanwhite

Nice work @Mu-Magdy! This is a good step forward.

It still has the same problem described in my last review for adding new models, because we'd have to edit model_repo_dict, release a new version of deepforest, and get users to install it, every time we added a new model.

To fix this we need to get rid of model_repo_dict and just pass the values in that dict to the function directly. So, instead of repo_id looking up a value from model_repo_dict based on model_name, repo_id is what gets passed as an argument to the function (e.g., "weecology/deepforest-bird" gets passed instead of "bird").

We can still keep with special case config updating Line 154 accordingly.

Does that make sense?

Mu-Magdy · 2024-09-12T21:19:19Z

HI @ethanwhite

in the latest commit i used different approach to download the files just by defining the repo id,
i didn't test it totally and it needs work. but i wanted to know if this way is what you wanted to be used.
In this we can use our own from_pretrained without using the PyTorchModelHubMixin and it can be used for all the models we just need to have the model file and config file have the same name accross all the repos without safetensors or any other extenstion

Mu-Magdy · 2024-09-13T23:58:53Z

Hi @ethanwhite

In this commit I use the approach that was discussed on slack.

you can test is using:

import json
from main import deepforest
from huggingface_hub import login

login('') # Hf key token

model = deepforest()

model.load_model('MuMagdy/test_deepforest')
print(model)
print(json.dumps(model.config, indent=4))

this is a repo I made to test containing alll required files.

If the approach is accepted, After unittest and merging we need to upload all the models as model.safetensors (or any other deafult naming HF uses) and all configs to be (config.json).

we can do this simply using:

from huggingface_hub import login
from main import deepforest
login() # Hf key token
model = deepforest.load_from_checkpoint("")
model.push_to_hub(repo_id)

this will make a model.safetensors , config.json and README.md
But the config.json file will be like this

{
  "config_args": null,
  "config_file": "deepforest_config.yml",
  "existing_train_dataloader": null,
  "existing_val_dataloader": null,
  "label_dict": {
    "Tree": 0
  },
  "model": null,
  "num_classes": 1,
  "transforms": null
}

I don't know if this is sufficient for the model or not

I think this will also automate the downlaod counter, because i didn't set it and it is working fine for the repo

ethanwhite

This is looking really good @Mu-Magdy. I've made a few additional comments to clean things up with your current changes.

There are a couple of other things to do before this is merged:

We need the https://huggingface.co/weecology/deepforest-tree & https://huggingface.co/weecology/deepforest-bird/ to be updated. You'd sent a PR to deepforest-birds, but it needs a config.json file added and then we need the same updates to deepforest-trees. Let me know if you'd to do this or you'd like someone else to take care of it.
I believe there is a bunch of code that will no longer run once these changes are made. Specifically I think at least utilities.use_release(), utilities.use_bird_release(), and utilities.fetch_model(). These should be removed in this PR.
load_model() needs some tests. These can probably just mirror test_use_release() and test_bird_release(), but using load_model()

If you'd like help with any of this please let me know.

ethanwhite · 2024-09-14T10:27:34Z

deepforest/main.py

@@ -14,12 +14,14 @@
 from pytorch_lightning.callbacks import LearningRateMonitor
 from torch import optim
 from torchmetrics.detection import IntersectionOverUnion, MeanAveragePrecision
-
+from huggingface_hub import PyTorchModelHubMixin, hf_hub_download, snapshot_download


Looks like we aren't using either hf_hub_download or snapshot_download in the final version, so remove them

ethanwhite · 2024-09-14T10:30:26Z

deepforest/main.py

+    def load_model(self, model_name="weecology/deepforest-tree", revision='main'):
+        """Load DeepForest models from Hugging Face using from_pretrained().
+
+        Args:
+            model_name (str): A repository ID for huggingface in the form of organization/repository
+            version (str): The model version ('main', 'v1.0.0', etc.).


Match variable name to docstring for revision/version

ethanwhite · 2024-09-14T17:48:11Z

deepforest/main.py

@@ -117,6 +119,31 @@ def __init__(self,

        self.save_hyperparameters()

+    def load_model(self, model_name="weecology/deepforest-tree", revision='main'):
+        """Load DeepForest models from Hugging Face using from_pretrained().


We want the docstring to focus on what the function does, not how it does it, and be understandable to a novice user. Something like:

"""Load pretrained DeepForest model Loads a model that has already been pretrained for a specific task, like tree crown detection. Models (technically model weights) are distributed via Hugging Face and designated the Hugging Face repository ID (model_name), which is in the form: 'organization/repository'. For a list of models distributed by the DeepForest team (and the associated model names) see the documentation: https://deepforest.readthedocs.io/en/latest/installation_and_setup/prebuilt.html ... """

ethanwhite · 2024-09-14T17:49:50Z

deepforest/main.py

+        warnings.warn(
+            "use_bird_release will be deprecated in 2.0. use load_model('bird') instead",
+            DeprecationWarning)
+        self.load_model('bird')


Should be 'weecology/deepforest-bird'.

…est for load_model

C

Mu-Magdy · 2024-09-20T12:06:33Z

In the 0ffc838 commit all errors are form utilities.use_release() missing so I will add a new commit with use_release again to see if the any other issues errors appears

henrykironde · 2024-09-25T22:32:11Z

@Mu-Magdy Thank you so much for this work.

add load model and fetch model

5b5a224

Mu-Magdy added 4 commits August 30, 2024 04:49

test

ba96843

test

b42cb87

test

9705ed5

test

c0227a5

ethanwhite requested changes Aug 30, 2024

View reviewed changes

ethanwhite mentioned this pull request Sep 4, 2024

Livestock model #765

Closed

add from_pretrained and load_model methods

87db1fc

add load_model

9876b9f

ethanwhite requested changes Sep 14, 2024

View reviewed changes

Mu-Magdy and others added 8 commits September 16, 2024 19:20

Remove use_release and use_bird_release from utilities and add unit t…

d33b3ff

…est for load_model

Add safetensors to requirements

6b7a617

Merge branch 'main' into hf-hub-download

d846f77

modify setup'

1531aa5

C

modify setup'

bc2c7d3

C

solve test failures

b686db7

test using ethanwhite/df-test

0ffc838

test

df5f8c8

Mu-Magdy added 6 commits September 20, 2024 17:08

replace utilities.use_release() with main.deepforest().use_release()

78ede81

add the right repo_id

f8041ee

remove test for checking NEON.pt path

84b59c8

remove test for checking NEON.pt path

e609754

add docs to the new function

0cc67b2

add docs to the new function

9bd9998

Merge branch 'main' into hf-hub-download

143ab82

henrykironde approved these changes Sep 25, 2024

View reviewed changes

ethanwhite approved these changes Sep 25, 2024

View reviewed changes

henrykironde merged commit 86dcfe4 into weecology:main Sep 26, 2024
5 checks passed

Mu-Magdy deleted the hf-hub-download branch September 26, 2024 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add load model and fetch model #762

add load model and fetch model #762

Mu-Magdy commented Aug 28, 2024

Mu-Magdy commented Aug 28, 2024

ethanwhite commented Aug 29, 2024 •

edited

Loading

ethanwhite commented Aug 29, 2024

ethanwhite left a comment

Mu-Magdy commented Sep 12, 2024

Mu-Magdy commented Sep 13, 2024

ethanwhite left a comment

ethanwhite Sep 14, 2024

ethanwhite Sep 14, 2024

ethanwhite Sep 14, 2024

ethanwhite Sep 14, 2024

Mu-Magdy commented Sep 20, 2024 •

edited

Loading

henrykironde commented Sep 25, 2024

add load model and fetch model #762

add load model and fetch model #762

Conversation

Mu-Magdy commented Aug 28, 2024

Mu-Magdy commented Aug 28, 2024

ethanwhite commented Aug 29, 2024 • edited Loading

ethanwhite commented Aug 29, 2024

ethanwhite left a comment

Choose a reason for hiding this comment

Mu-Magdy commented Sep 12, 2024

Mu-Magdy commented Sep 13, 2024

ethanwhite left a comment

Choose a reason for hiding this comment

ethanwhite Sep 14, 2024

Choose a reason for hiding this comment

ethanwhite Sep 14, 2024

Choose a reason for hiding this comment

ethanwhite Sep 14, 2024

Choose a reason for hiding this comment

ethanwhite Sep 14, 2024

Choose a reason for hiding this comment

Mu-Magdy commented Sep 20, 2024 • edited Loading

henrykironde commented Sep 25, 2024

ethanwhite commented Aug 29, 2024 •

edited

Loading

Mu-Magdy commented Sep 20, 2024 •

edited

Loading