Skip to content

Bi-Level Optimization using Validation Set #137

Answered by mmargalo
mmargalo asked this question in Q&A
Discussion options

You must be logged in to vote

I see, so the model is not updated even if it's referenced on the workers. Thanks for the advice, I ended up merging the inner and val loops instead, with the val loop going over a small batch instead of the entire val set.

SUCCEEDS
train() on rank0 -> inner_val_loop_combined on parallel -> train() on rank0, backprop
FAILS
train() on rank0 -> inner_loop on parallel -> train() on rank0, call val_loop -> val_loop on parallel -> train() on rank0, backdrop

If I understand correctly, you are updating the model parameters in the inner_loop. But in your initial code:

@parallelize
def inner_loop(net, x, y):
    # ... clone net by reference
    loss = net(x) # built-in criterion
    meta_opt.step(

Replies: 2 comments 8 replies

Comment options

You must be logged in to vote
8 replies
@XuehaiPan
Comment options

@mmargalo
Comment options

@XuehaiPan
Comment options

@mmargalo
Comment options

@XuehaiPan
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by mmargalo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
distributed Something related to distributed training
2 participants