You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
suspect: this probably the optimizer issue, the optimizers like adam and others, they store the first order and second order momentum, this would be messed up the process?
Also,
if we print the message in the run function (loops/)
print(f"run entry")
#import traceback#traceback.print_stack()ifself.skip:
returnself.on_skip()
self.reset()
self.on_run_start(*args, **kwargs)
importosprint(f'{os.getpid()}')
count=0whilenotself.done:
try:
self.on_advance_start(*args, **kwargs)
self.advance(*args, **kwargs)
self.on_advance_end()
self._restarting=Falseimportosprint(f'i am in the {count} round, pid: {os.getpid()}')
fromtimeimportsleepifcount==3:
sleep(100)
count+=1exceptStopIteration:
breakself._restarting=Falseoutput=self.on_run_end()
we will see that there will be three concurrent threads going through this function, the outputs looks like this
suspect: this probably the optimizer issue, the optimizers like adam and others, they store the first order and second order momentum, this would be messed up the process?
Also,
if we print the message in the run function (
loops/
)we will see that there will be three concurrent threads going through this function, the outputs looks like this
the high level bits are
some come from
self.fit_loop.run()
(this is expected)and some come from
self.optimizer_loop.run(split_batch, optimizers, batch_idx)
(this is not expected).The text was updated successfully, but these errors were encountered: