-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot download model #84
Comments
Hey @peiyaoli ! Thank you for reporting this bug! Before we investigate, it would be useful to have some additional information on your environment, specifically your Molfeat version, Python version, Pydantic version and how you installed your environment (i.e. pip, conda, source). |
Something is wrong with downloading at the stage of hash sum comparison |
@cwognum do you want to have a look ? |
I'm not sure what's happening here. I could reproduce the bug locally. As the error suggests, it appears the checksum computed when we created the artifact no longer matches the checksum we compute when downloading the artifacts locally. I'm not sure what causes this. Recreating the artifact using the ETL notebooks leads to an entirely new hashsum that doesn't match any of the artifacts in the thrown exception. I will investigate further! |
@cwognum, did you manage to find the error ? Looking at the code, my first take would be that the order of the file in the shasum changed for some reasons ... molfeat/molfeat/utils/commons.py Lines 45 to 55 in 97855c6
Can you generate all randomization of the filelist order and check if you can recover the original shasum ? |
That's indeed one of the things I considered. I haven't tried all combinations though, that's a good idea! |
@maclandrol I tried the different permutations, but it does not include the actual hash. |
Ok, I think for now, maybe just apply sort on the file names, to ensure the same consistent order, then recompute the hash and update the hash in the metadata. |
I looked into a couple of things, but I'm not sure what started causing the issue, which is a very unsatisfying conclusion. Some further notes for future reference:
I can imagine that the above changes affect the order of the files, but not actually the number of files or the content of the files. Since trying all permutation of file-paths to recover the expected hash doesn't work, I'm not sure what's happening here. Anyways, the fix is luckily simple! I recreated the featurizer artifacts by simple rerunning the ETL notebooks and all seems to work again. I made a small PR to sort the to-be-hashed files: #86 Let me know if the issue persists or pops up again! If so, we will have to investigate further. One final note: ChemGPT-1.2B |
Thank you @cwognum The recent changes have fixed the issue on my side. |
Thanks @cwognum I'll check and let you know asap if it also fixes the issue on my side! |
Bug description Hello! @cwognum I'm currently using molfeat in a conda env by running pip install molfeat=0.8.8. When I try to fetch a Pretrained HF transformers like GPT2-Zinc480, Roberta-Zinc480M-102M and MolT-5, I would get the following error message:
It seems related to caches and where the models are stored locally, what should I do? How to reproduce the bug
Error messages and logs
Environment
|
Hi @dawndarkmusic, thanks for reporting! Could you try upgrading to the latest Ping @maclandrol - If we end up needing some more robust model / data versioning, I recently came across https://github.com/iterative/dvc which seems pretty powerful! |
Hello @cwognum |
Yes, moving away from the custom model store would be a good idea. We are getting too many issues related to GCS. @dawndarkmusic can you follow the instruction here to delete your cache and try again ? In your case, it's better to clear the whole cache directory, and then restart your python runtime. import datamol as dm
import platformdirs
# delete the cache dir
path_dir = platformdirs.user_cache_dir("molfeat")
mapper = dm.fs.get_mapper(path_dir)
mapper.fs.delete(path_dir, recursive=True) |
Hello @maclandrol I've tried to clear the whole cache directory and restart it again but still got the same issue Untitled.video.-.Made.with.Clipchamp.1.mp4 |
Thanks @dawndarkmusic, this is very strange. Even when deleting and restarting the interpreter, you are having the same issue ? Some of the function are functools cached, so data is saved in memory. Normally by restarting the interpreter, it should purge that data too. @cwognum can you handle this ? |
Hello @maclandrol If deleting and restarting the interpreter means to shut down the whole juypter notebook and restart the kernel, then yes, I'm still having the issue right now and I couldn't figure the reason. |
@cwognum @maclandrol It seems this issue can be closed. |
Is there an existing issue for this?
Bug description
I try to search model with:
model_card = store.search(name="ChemBERTa-77M-MLM")[0]
but got:
AttributeError: 'ModelInfo' object has no attribute 'model_dump'
How to reproduce the bug
No response
Error messages and logs
Environment
Current environment
Additional context
No response
The text was updated successfully, but these errors were encountered: