Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of toy data in visualize-embeddings.ipynb #324

Open
enemni opened this issue Sep 4, 2024 · 2 comments
Open

Lack of toy data in visualize-embeddings.ipynb #324

enemni opened this issue Sep 4, 2024 · 2 comments

Comments

@enemni
Copy link

enemni commented Sep 4, 2024

The notebook https://github.com/Clay-foundation/model/blob/compile/docs/tutorials/visualize-embeddings.ipynb cannot be reproduced end-to-end as it reads from DATA_DIR = "/home/ubuntu/data" and there is no clear instruction on how to download the required toy dataset.

@santos-naudiyal
Copy link

try this
https://gofile.io/d/yMHabA
select the desired compiler in the installer

@srmsoumya
Copy link
Collaborator

Thanks for bringing this up, @enemni. We do need to provide some sample data for users to test the tutorial notebooks. I'll add that to my to-do list.

In the meantime, you can use the stachip library, which we use to create the training dataset, to pull some sample data. Here's a helpful tutorial to guide you through the process.

cc @yellowcap

@enemni
Copy link
Author

enemni commented Sep 9, 2024

On a similar note: I was working on reproducing the code from regression notebook and noticed that there is no mechanism in place to download the data. Cloning the dataset from HF is returning errors and there are already issues open here and here. Probably this is on nascetti-a/BioMassters but it is currently blocking the reproducibility of the notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants