File tree Expand file tree Collapse file tree 1 file changed +17
-0
lines changed
src/lightning/pytorch/callbacks Expand file tree Collapse file tree 1 file changed +17
-0
lines changed Original file line number Diff line number Diff line change @@ -45,6 +45,23 @@ class DeviceStatsMonitor(Callback):
45
45
ModuleNotFoundError:
46
46
If ``psutil`` is not installed and CPU stats are monitored.
47
47
48
+ Logged Metrics:
49
+ Device statistics are logged with keys prefixed as
50
+ ``DeviceStatsMonitor.{hook_name}/{base_metric_name}`` (e.g.,
51
+ ``DeviceStatsMonitor.on_train_batch_start/cpu_percent``). The source of these
52
+ metrics depends on the active :class:`~lightning.pytorch.accelerators.accelerator.Accelerator`
53
+ and the ``cpu_stats`` flag.
54
+
55
+ CPU (via ``psutil``): Logs ``cpu_percent``, ``cpu_vm_percent``, ``cpu_swap_percent``.
56
+ All are percentages (%).
57
+ CUDA GPU (via :func:`torch.cuda.memory_stats`): Logs detailed memory statistics from
58
+ PyTorch's allocator (e.g., ``allocated_bytes.all.current``, ``num_ooms``; all in Bytes).
59
+ GPU compute utilization is not logged by default.
60
+ Other Accelerators (e.g., TPU, MPS): Logs device-specific stats.
61
+ - TPU example: ``avg. free memory (MB)``.
62
+ - MPS example: ``mps.current_allocated_bytes``.
63
+ Observe logs or check accelerator documentation for details.
64
+
48
65
Example::
49
66
50
67
from lightning import Trainer
You can’t perform that action at this time.
0 commit comments