Skip to content

Commit 798f9c9

Browse files
committed
DOC: Clarify DeviceStatsMonitor logged metrics (#20807)
1 parent aebf3f4 commit 798f9c9

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

src/lightning/pytorch/callbacks/device_stats_monitor.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,21 @@ class DeviceStatsMonitor(Callback):
3434
r"""Automatically monitors and logs device stats during training, validation and testing stage.
3535
``DeviceStatsMonitor`` is a special callback as it requires a ``logger`` to passed as argument to the ``Trainer``.
3636
37+
Logged Metrics:
38+
Device statistics are logged with keys prefixed as
39+
``DeviceStatsMonitor.{hook_name}/{base_metric_name}`` (e.g.,
40+
``DeviceStatsMonitor.on_train_batch_start/cpu_percent``). The source of these metrics depends on the ``cpu_stats`` flag and the active accelerator.
41+
42+
CPU (via ``psutil``): Logs ``cpu_percent``, ``cpu_vm_percent``, ``cpu_swap_percent``.
43+
All are percentages (%).
44+
CUDA GPU (via :func:`torch.cuda.memory_stats`): Logs detailed memory statistics from
45+
PyTorch's allocator (e.g., ``allocated_bytes.all.current``, ``num_ooms``; all in Bytes).
46+
GPU compute utilization is not logged by default.
47+
Other Accelerators (e.g., TPU, MPS): Logs device-specific stats.
48+
- TPU example: ``avg. free memory (MB)``.
49+
- MPS example: ``mps.current_allocated_bytes``.
50+
Observe logs or check accelerator documentation for details.
51+
3752
Args:
3853
cpu_stats: if ``None``, it will log CPU stats only if the accelerator is CPU.
3954
If ``True``, it will log CPU stats regardless of the accelerator.
@@ -45,6 +60,7 @@ class DeviceStatsMonitor(Callback):
4560
ModuleNotFoundError:
4661
If ``psutil`` is not installed and CPU stats are monitored.
4762
63+
4864
Example::
4965
5066
from lightning import Trainer

0 commit comments

Comments
 (0)