Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't run on large datasets #1759

Open
Smarter1214 opened this issue Jul 14, 2024 · 2 comments
Open

can't run on large datasets #1759

Smarter1214 opened this issue Jul 14, 2024 · 2 comments

Comments

@Smarter1214
Copy link

When I correctly follow the steps in the auto3dseg_hello_world.ipynb notebook, set the corresponding paths and parameters, and run it in an environment with 48G of GPU memory, I encounter the error RuntimeError: Pin memory thread exited unexpectedly while attempting to train on a dataset with 300 .nii.gz images. In contrast, when using a dataset with 20 images, the training proceeds smoothly under the exact same conditions. During the training process with the 300-image dataset, I monitored the GPU memory usage and found it to be less than 70%. However, the error keeps occurring inexplicably. Could there be an issue with the get_data step?

@ericspod
Copy link
Member

@mingxin-zheng @dongyang0122 @wyli would anyone have insights here? This may be related to multiprocessing issues, number of open files, garbage collection, the Pytorch sharing strategy, or some other technique issue. Thanks!

@mingxin-zheng
Copy link
Contributor

Thanks @Smarter1214 for finding the issue. It would be helpful if you can share some logs/outputs so that we can further pinpoint the issue

In general, I am wondering in which step the error occurs, DataAnalyzing vs Training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants