Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaNet probably broken with num_workers > 0 #19

Closed
ejnnr opened this issue Nov 17, 2023 · 4 comments · Fixed by #22
Closed

WaNet probably broken with num_workers > 0 #19

ejnnr opened this issue Nov 17, 2023 · 4 comments · Fixed by #22

Comments

@ejnnr
Copy link
Owner

ejnnr commented Nov 17, 2023

test_wanet in test_pipeline.py currently fails if we set use multiple workers: the cfg.train_data.backdoor instance doesn't have its warping field initialized. My assumption is that this is because each worker ends up using its own copy of that, and the warping field is only initialized after making those copies (on the first __call__). More importantly than failing this test case, I think that means the warping fields won't be shared across workers.

I've added a check in train_classifier.py that raises an error in this setting, so not super high priority. But we should fix this, probably by initializing the warping field earlier.

@VRehnberg
Copy link
Collaborator

I believe the control_grid is initialized in the __post_init__ which should be early enough. The warping_fields should be initialized as soon as the pixel size is known, if this is done for each worker it should be fine anyway.

One problem could be if control_grid is loaded, not sure how that would be handled across workers. What happens with validation loaders could also be a problem.

@ejnnr
Copy link
Owner Author

ejnnr commented Nov 17, 2023

Oops, that sounds right. The test still fails with multiple workers, but that's not a big deal (we can just run it with num_workers=0). For now I'll leave in the check until we're sure that validation works, but might be we don't actually need to change anything

@VRehnberg
Copy link
Collaborator

Took a look now and:

I think the best thing is just to add some tests and to see that everything works as it should. I'll (probably) do that sometime the coming days in-between other stuff.

@ejnnr
Copy link
Owner Author

ejnnr commented Nov 22, 2023

When looking for the loading I saw that TaskConfig only initializes datasets on demand https://github.com/ejnnr/cupbearer/blob/main/src/cupbearer/tasks/_config.py#L48-L53 which could be a potential issue

I think that part should be fine, we always create a single dataset and then pass that to the dataloaders. So I think the only issue would be with code that only gets called after the __init__ of the dataset itself, which is probably only the warping field creation (and as you've pointed out that's deterministic since control_grid is shared).

Seems pretty likely to me everything is actually fine and this was only specifically a problem for the test, so I'd be fine with removing the check and closing this. But if you still feel unsure, adding more tests sounds good.

@VRehnberg VRehnberg linked a pull request Nov 24, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants