Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot download model #84

Closed
1 task done
peiyaoli opened this issue Oct 30, 2023 · 20 comments
Closed
1 task done

cannot download model #84

peiyaoli opened this issue Oct 30, 2023 · 20 comments
Labels
bug Something isn't working

Comments

@peiyaoli
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and found nothing

Bug description

I try to search model with:
model_card = store.search(name="ChemBERTa-77M-MLM")[0]

but got:
AttributeError: 'ModelInfo' object has no attribute 'model_dump'

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Molfeat version (e.g., 0.1.0):
#- PyTorch Version (e.g., 1.10.0):
#- RDKit version (e.g., 2022.09.5): 
#- scikit-learn version (e.g.,  1.2.1): 
#- OS (e.g., Linux):
#- How you installed Molfeat (`conda`, `pip`, source):

Additional context

No response

@peiyaoli peiyaoli added the bug Something isn't working label Oct 30, 2023
@cwognum
Copy link
Contributor

cwognum commented Oct 30, 2023

Hey @peiyaoli ! Thank you for reporting this bug! Before we investigate, it would be useful to have some additional information on your environment, specifically your Molfeat version, Python version, Pydantic version and how you installed your environment (i.e. pip, conda, source).

@azkalot1
Copy link

azkalot1 commented Nov 3, 2023

Something is wrong with downloading at the stage of hash sum comparison
The error is raised here: https://github.com/datamol-io/molfeat/blob/97855c6c7df2c46acb698d64eab60b08006c8936/molfeat/store/modelstore.py#L235C2-L235C2

@maclandrol
Copy link
Member

@cwognum do you want to have a look ?

@JHlozek
Copy link

JHlozek commented Nov 6, 2023

I am getting a similar hash sum error when trying to download the chemGPT model.

I installed molfeat into its own conda environment and used the 'conda install -c conda-forge molfeat' for the install.
molfeat_chemgpt

@cwognum
Copy link
Contributor

cwognum commented Nov 7, 2023

I'm not sure what's happening here.

I could reproduce the bug locally. As the error suggests, it appears the checksum computed when we created the artifact no longer matches the checksum we compute when downloading the artifacts locally. I'm not sure what causes this. Recreating the artifact using the ETL notebooks leads to an entirely new hashsum that doesn't match any of the artifacts in the thrown exception. I will investigate further!

@maclandrol
Copy link
Member

@cwognum, did you manage to find the error ? Looking at the code, my first take would be that the order of the file in the shasum changed for some reasons ...

if dm.fs.is_dir(filepath):
files = list(dm.fs.glob(os.path.join(filepath, "**", "*")))
else:
files = [filepath]
file_hash = hashlib.sha256()
for filepath in files:
with fsspec.open(filepath) as f:
file_hash.update(f.read()) # type: ignore
file_hash = file_hash.hexdigest()
return file_hash

Can you generate all randomization of the filelist order and check if you can recover the original shasum ?

@cwognum
Copy link
Contributor

cwognum commented Nov 7, 2023

That's indeed one of the things I considered. I haven't tried all combinations though, that's a good idea!

@cwognum
Copy link
Contributor

cwognum commented Nov 9, 2023

@maclandrol I tried the different permutations, but it does not include the actual hash.

@maclandrol
Copy link
Member

Ok, I think for now, maybe just apply sort on the file names, to ensure the same consistent order, then recompute the hash and update the hash in the metadata.

@cwognum
Copy link
Contributor

cwognum commented Nov 10, 2023

I looked into a couple of things, but I'm not sure what started causing the issue, which is a very unsatisfying conclusion.

Some further notes for future reference:

I can imagine that the above changes affect the order of the files, but not actually the number of files or the content of the files. Since trying all permutation of file-paths to recover the expected hash doesn't work, I'm not sure what's happening here.

Anyways, the fix is luckily simple! I recreated the featurizer artifacts by simple rerunning the ETL notebooks and all seems to work again. I made a small PR to sort the to-be-hashed files: #86

Let me know if the issue persists or pops up again! If so, we will have to investigate further.

One final note: ChemGPT-1.2B and ChemGPT-19M is still running

@JHlozek
Copy link

JHlozek commented Nov 10, 2023

Thank you @cwognum

The recent changes have fixed the issue on my side.

@GemmaTuron
Copy link

Thanks @cwognum I'll check and let you know asap if it also fixes the issue on my side!

@dawndarkmusic
Copy link

dawndarkmusic commented Nov 28, 2023

Bug description

Hello! @cwognum

I'm currently using molfeat in a conda env by running pip install molfeat=0.8.8. When I try to fetch a Pretrained HF transformers like GPT2-Zinc480, Roberta-Zinc480M-102M and MolT-5, I would get the following error message:

ModelStoreError: Can't retrieve model MolT5 from the store !

It seems related to caches and where the models are stored locally, what should I do?

How to reproduce the bug

from molfeat.trans.pretrained.hf_transformers import PretrainedHFTransformer
transformer = PretrainedHFTransformer(kind='MolT5', notation='selfies', dtype=float)
features = transformer(my_smiles_list)

Error messages and logs

ModelStoreError                           Traceback (most recent call last)
File ~\anaconda3\lib\site-packages\molfeat\store\loader.py:100, in PretrainedStoreModel._load_or_raise(cls, name, download_path, store, **kwargs)
     99     modelcard = store.search(name=name)[0]
--> 100     artifact_dir = store.download(modelcard, download_path, **kwargs)
    101 except Exception as e:

File ~\anaconda3\lib\site-packages\molfeat\store\modelstore.py:216, in ModelStore.download(self, modelcard, output_dir, chunk_size, force)
    215     mapper.fs.delete(output_dir, recursive=True)
--> 216     raise ModelStoreError(
    217         f"""The destination artifact at {model_dest_path} has a different sha256sum ({cache_sha256sum}) """
    218         f"""than the Remote artifact sha256sum ({modelcard.sha256sum}). The destination artifact has been removed !"""
    219     )
    221 return output_dir
ModelStoreError: The destination artifact at C:\Users\dd\AppData\Local\molfeat\molfeat\Cache/MolT5/model.save has a different sha256sum (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) than the Remote artifact sha256sum (e0537549289bfffc9ba6a5fb17c5b8d031e1b04a17555fd8f6494ebe3ce79395). The destination artifact has been removed !

Environment

#- Molfeat version (e.g., 0.1.0): 0.8.8
#- Python version (e.g., 1.10.0): 3.10.9
#- RDKit version (e.g., 2022.09.5):  2023.03.1
#- scikit-learn version (e.g.,  1.2.1): 1.2.1
#- OS (e.g., Linux): Windos 10 22H2
#- How you installed Molfeat (`conda`, `pip`, source): pip

@cwognum
Copy link
Contributor

cwognum commented Nov 28, 2023

Hi @dawndarkmusic, thanks for reporting! Could you try upgrading to the latest molfeat version? We might have broken backwards compatibility with #86 by ordering the to-be-hashed files.

Ping @maclandrol - If we end up needing some more robust model / data versioning, I recently came across https://github.com/iterative/dvc which seems pretty powerful!

@dawndarkmusic
Copy link

Hello @cwognum
I've upgraded the molfeat to version 0.9.5 but still got the same issue with using the same code.
Here are some screenshots while running the code, it generated the loading task bar every time and then popped out the error. When I used the PretrainedHFTransformer before it wouldn't show the task bar in my memory, should I restart the computer to test it again? Any help will be greatly appreciated! Thank you

image
image
image

@maclandrol
Copy link
Member

Ping @maclandrol - If we end up needing some more robust model / data versioning, I recently came across https://github.com/iterative/dvc which seems pretty powerful!

Yes, moving away from the custom model store would be a good idea. We are getting too many issues related to GCS.

@dawndarkmusic can you follow the instruction here to delete your cache and try again ?

#29 (comment)

In your case, it's better to clear the whole cache directory, and then restart your python runtime.

import datamol as dm
import platformdirs

# delete the cache dir
path_dir = platformdirs.user_cache_dir("molfeat")
mapper = dm.fs.get_mapper(path_dir)
mapper.fs.delete(path_dir, recursive=True)

@dawndarkmusic
Copy link

Hello @maclandrol

I've tried to clear the whole cache directory and restart it again but still got the same issue
Here are the recorded shorts of the situation, the pretrained transformer disappeared from the cache folder once the error popped out, sorry for bothering you and @cwognum .

Untitled.video.-.Made.with.Clipchamp.1.mp4

@maclandrol
Copy link
Member

Thanks @dawndarkmusic, this is very strange. Even when deleting and restarting the interpreter, you are having the same issue ?

Some of the function are functools cached, so data is saved in memory. Normally by restarting the interpreter, it should purge that data too. @cwognum can you handle this ?

@dawndarkmusic
Copy link

Hello @maclandrol

If deleting and restarting the interpreter means to shut down the whole juypter notebook and restart the kernel, then yes, I'm still having the issue right now and I couldn't figure the reason.
However, after I tried using another PC to install molfeat and run the code again, the code went fine and nothing happened. Guess it's just having some troubles in my PC, I'll try to figure it out by myself. Sorry for bothering.

@zhu0619
Copy link
Contributor

zhu0619 commented Jan 3, 2024

@cwognum @maclandrol It seems this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants