Logging in Multi - GPU and new on_validation_step function #20362
Unanswered
42elenz
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
I am overall a bit unsure about my Lightning functions since I am a beginner. I would love to get some feedback about my functions and if I did it right. I was initially working with an old version and then the whole on_validation_step, on_training_step was breaking my logging code.
Also I am wondering why I have to specify:
"ddp_find_unused_parameters_true"
I don't know why there should be unused parameter.
So this is my Trainer:
trainer = Trainer( max_epochs=args.training.epochs, logger=wandb_logger_pt, #log_every_n_steps = args.logging.log_every_n_steps, callbacks=[checkpoint_callback], accelerator="gpu", precision=16, strategy="ddp_find_unused_parameters_true" )
Next this is my Model.
I actually first wanted to use the validation_step_end to append to my self.validation:step_output but the steps were never called. So I had to do it in on_train_epoch and on_valdidation_epoch_end. Is this correct or will this blow up at some point? Probably I just didnt have enough steps so it would be called.
My big problem here is that I want to save the found IDs to a df and save the df. How do I do this with multiple threads? Can I merge threads? How can I just call the root thread for example?
Is it overall correct how I did my logging? I have some parameter that are specific to my task thats the reason I call them on end of the epochs.
For my Downstream-Task I am using a mix of selfmade metrics (balanced Accuracy) and pl metrices.
Does this look ok?:
I hope that you can provide me feedback :)
So the summarization is:
Beta Was this translation helpful? Give feedback.
All reactions