-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf_vae.json empty after running vae_train.py #31
Comments
Hi, I think the problem comes from the location of
I addressed this problem by moving it to _builg_graph function
|
Thanks for your suggestion, @leekwoon. I no longer have access to the DGX station but I tried the change in a AWS instance. For now it looks good, the vae.json file was generated. Do you have a fork or pull request to check out for other necessary changes before continuing the training? I suspect there may be more troubles ahead. |
FWIW, I did find that step 3 of the GPU jobs showed some problems with the patched code so after a bit of failed troubleshooting I just decided to go back to a commit around the time the paper was published (c0cb2de) and try again. Everything worked as expected without changing any code. 👍 |
Thanks for the testing, @asolano. Maybe I should just roll back the code to that time... |
Greetings,
I am trying to reproduce the experiment on a DGX station I currently have access to, and the fist two steps looks alright, but the result of the command:
is an empty array:
According to the documentation the model should be saved on that file, so any hint about where to look for the problem is appreciated.
Thanks,
Alfredo
PS: I am using the following Dockerfile to recreate the environment in the paper, in case in might be relevant:
The text was updated successfully, but these errors were encountered: