Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: cannot import name 'MeshDataset' from 'meshgpt_pytorch' #67

Open
StephenYangjz opened this issue Mar 11, 2024 · 5 comments

Comments

@StephenYangjz
Copy link

Hi, I am running the demo and it seems like MeshDataset can not be imported from meshgpt_pytorch. Any help @lucidrains would be greatly appreciated!

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[5], [line 4](vscode-notebook-cell:?execution_count=5&line=4)
      [2](vscode-notebook-cell:?execution_count=5&line=2) import gc     
      [3](vscode-notebook-cell:?execution_count=5&line=3) import os
----> [4](vscode-notebook-cell:?execution_count=5&line=4) from meshgpt_pytorch import MeshDataset 
      [6](vscode-notebook-cell:?execution_count=5&line=6) project_name = "demo_mesh" 
      [8](vscode-notebook-cell:?execution_count=5&line=8) working_dir = f'.\{project_name}'

ImportError: cannot import name 'MeshDataset' from 'meshgpt_pytorch' (/home/stephen/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/__init__.py)
@MarcusLoppe
Copy link
Contributor

Hi Stephen,

So MeshDataset is a class which I created for meshgpt_pytorch, I made a pull request for it but not sure why it wasn't accepted.
What ever the reason; the difference between my fork and meshgpt is just the modified trainer class (train by epochs instead and get progress reports from tdqm) and MeshDataset.

If you'd like to use my MeshDataset you can install my fork or just copy and paste MeshDataset into your code.

@StephenYangjz
Copy link
Author

That resolves it (init also needs to be updated) Thanks!

@StephenYangjz
Copy link
Author

Hi @MarcusLoppe , I didnt think I have this issue before but now im getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], [line 11](vscode-notebook-cell:?execution_count=10&line=11)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [2](vscode-notebook-cell:?execution_count=10&line=2) #                                              batch_size=8,
      [3](vscode-notebook-cell:?execution_count=10&line=3) #                                              grad_accum_every=2,
      [4](vscode-notebook-cell:?execution_count=10&line=4) #                                              learning_rate = 1e-2) 
      [5](vscode-notebook-cell:?execution_count=10&line=5) # loss = autoencoder_trainer.train(280,stop_at_loss = 0.7, diplay_graph= True)   
      [7](vscode-notebook-cell:?execution_count=10&line=7) autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [8](vscode-notebook-cell:?execution_count=10&line=8)                                              batch_size=8,
      [9](vscode-notebook-cell:?execution_count=10&line=9)                                              grad_accum_every=2,
     [10](vscode-notebook-cell:?execution_count=10&line=10)                                              learning_rate = 4e-3) 
---> [11](vscode-notebook-cell:?execution_count=10&line=11) loss = autoencoder_trainer.train(280,stop_at_loss = 0.28, diplay_graph= True)     

TypeError: train() got an unexpected keyword argument 'stop_at_loss'

Do you by any chance have any pointers? Thank you!

@MarcusLoppe
Copy link
Contributor

MarcusLoppe commented Mar 12, 2024

Hi @MarcusLoppe , I didnt think I have this issue before but now im getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], [line 11](vscode-notebook-cell:?execution_count=10&line=11)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [2](vscode-notebook-cell:?execution_count=10&line=2) #                                              batch_size=8,
      [3](vscode-notebook-cell:?execution_count=10&line=3) #                                              grad_accum_every=2,
      [4](vscode-notebook-cell:?execution_count=10&line=4) #                                              learning_rate = 1e-2) 
      [5](vscode-notebook-cell:?execution_count=10&line=5) # loss = autoencoder_trainer.train(280,stop_at_loss = 0.7, diplay_graph= True)   
      [7](vscode-notebook-cell:?execution_count=10&line=7) autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [8](vscode-notebook-cell:?execution_count=10&line=8)                                              batch_size=8,
      [9](vscode-notebook-cell:?execution_count=10&line=9)                                              grad_accum_every=2,
     [10](vscode-notebook-cell:?execution_count=10&line=10)                                              learning_rate = 4e-3) 
---> [11](vscode-notebook-cell:?execution_count=10&line=11) loss = autoencoder_trainer.train(280,stop_at_loss = 0.28, diplay_graph= True)     

TypeError: train() got an unexpected keyword argument 'stop_at_loss'

Do you by any chance have any pointers? Thank you!

Oh, I'm not sure, the train function is:
def train(self, num_epochs, stop_at_loss = None, diplay_graph = False):

Python has some weird issues so have you give it a go restarting the notebook kernel?

I'm currently running the below and it's working.
Btw, I should have removed one of the autoencoder_trainer so there is only one.
I found it better for the model to start training at a low learning rate since this will ensure the commit loss will be steadier and I don't really notice any improvements by having a higher learning rate at the start.

Also, target a batch size of 64, if you got enough VRAM, set the batch size to 64 and grad_accum_every to 1. Larger batch size equals faster the training time.
For training on a large dataset, you can set the commit_loss_weight to 0.25 otherwise it will shoot up to 100s. This way it puts pressure on the encoder to compress the tokens better.

Otherwise try to get a total effective batch size of 64 by changing grad_accum_every so it will equal 64:
batch_size * grad_accum_every = 64

save_name = "16k_2_4" 
batch_size=16
   
autoencoder.commit_loss_weight = 0.25  
autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 100, dataset = dataset, num_train_steps=100,
                                             batch_size=batch_size,
                                             grad_accum_every=4,
                                             learning_rate = 1e-4,
                                             checkpoint_every_epoch= 1) 
loss = autoencoder_trainer.train(480,stop_at_loss = 0.2, diplay_graph= True)  

@StephenYangjz
Copy link
Author

Thank you! Reinstalled the package and it went away, just seems to be kernel/package issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants