Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Can not checkpoint and log #228

Open
lcaquot94 opened this issue Nov 18, 2022 · 1 comment
Open

Can not checkpoint and log #228

lcaquot94 opened this issue Nov 18, 2022 · 1 comment

Comments

@lcaquot94
Copy link

The documentations says that when using with Ray Client, you must disable checkpointing and logging for your Trainer by setting checkpoint_callback and logger to False. So how can we log and save model during training ?

@bparaj
Copy link

bparaj commented Dec 10, 2022

I have been doing this:

  1. import TuneReportCheckpointCallback from ray_lightning
from ray_lightning.tune import TuneReportCheckpointCallback
  1. Disable checkpointing with "enable_checkpointing": False, in the pl Trainer's configuration
  2. Initialize logger:
    tb_logger = pl_loggers.TensorBoardLogger(save_dir="/tmp/some-dir")
  1. Initialize tuning strategy
    from ray_lightning import RayStrategy
    strategy = RayStrategy(num_workers=1, num_cpus_per_worker=1, use_gpu=True)
  1. Initialize trainer:
    trainer = pl.Trainer(
        **trainer_config,
        callbacks=[TuneReportCheckpointCallback({"accuracy": "accuracy"}, on="epoch_end")],
        strategy=strategy,
        logger=tb_logger
    )

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants