WaNet probably broken with `num_workers > 0` #19

ejnnr · 2023-11-17T04:02:16Z

test_wanet in test_pipeline.py currently fails if we set use multiple workers: the cfg.train_data.backdoor instance doesn't have its warping field initialized. My assumption is that this is because each worker ends up using its own copy of that, and the warping field is only initialized after making those copies (on the first __call__). More importantly than failing this test case, I think that means the warping fields won't be shared across workers.

I've added a check in train_classifier.py that raises an error in this setting, so not super high priority. But we should fix this, probably by initializing the warping field earlier.

The text was updated successfully, but these errors were encountered:

VRehnberg · 2023-11-17T15:35:03Z

I believe the control_grid is initialized in the __post_init__ which should be early enough. The warping_fields should be initialized as soon as the pixel size is known, if this is done for each worker it should be fine anyway.

One problem could be if control_grid is loaded, not sure how that would be handled across workers. What happens with validation loaders could also be a problem.

ejnnr · 2023-11-17T18:27:26Z

Oops, that sounds right. The test still fails with multiple workers, but that's not a big deal (we can just run it with num_workers=0). For now I'll leave in the check until we're sure that validation works, but might be we don't actually need to change anything

VRehnberg · 2023-11-22T15:38:10Z

Took a look now and:

Validation datasets inherit control_grid in train_classifier_config.py __post_init__
When looking for the loading I saw that TaskConfig only initializes datasets on demand https://github.com/ejnnr/cupbearer/blob/main/src/cupbearer/tasks/_config.py#L48-L53 which could be a potential issue

I think the best thing is just to add some tests and to see that everything works as it should. I'll (probably) do that sometime the coming days in-between other stuff.

ejnnr · 2023-11-22T17:30:21Z

When looking for the loading I saw that TaskConfig only initializes datasets on demand https://github.com/ejnnr/cupbearer/blob/main/src/cupbearer/tasks/_config.py#L48-L53 which could be a potential issue

I think that part should be fine, we always create a single dataset and then pass that to the dataloaders. So I think the only issue would be with code that only gets called after the __init__ of the dataset itself, which is probably only the warping field creation (and as you've pointed out that's deterministic since control_grid is shared).

Seems pretty likely to me everything is actually fine and this was only specifically a problem for the test, so I'd be fine with removing the check and closing this. But if you still feel unsure, adding more tests sounds good.

VRehnberg mentioned this issue Nov 23, 2023

Misc fixes tests #22

Merged

VRehnberg linked a pull request Nov 24, 2023 that will close this issue

Misc fixes tests #22

Merged

VRehnberg closed this as completed in #22 Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaNet probably broken with `num_workers > 0` #19

WaNet probably broken with `num_workers > 0` #19

ejnnr commented Nov 17, 2023

VRehnberg commented Nov 17, 2023

ejnnr commented Nov 17, 2023

VRehnberg commented Nov 22, 2023

ejnnr commented Nov 22, 2023

WaNet probably broken with num_workers > 0 #19

WaNet probably broken with num_workers > 0 #19

Comments

ejnnr commented Nov 17, 2023

VRehnberg commented Nov 17, 2023

ejnnr commented Nov 17, 2023

VRehnberg commented Nov 22, 2023

ejnnr commented Nov 22, 2023

WaNet probably broken with `num_workers > 0` #19

WaNet probably broken with `num_workers > 0` #19