Skip to content

Commit cf7e36a

Browse files
committed
DOC: Clarify DeviceStatsMonitor logged metrics (#20807)
1 parent aebf3f4 commit cf7e36a

File tree

1 file changed

+17
-0
lines changed

1 file changed

+17
-0
lines changed

src/lightning/pytorch/callbacks/device_stats_monitor.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,23 @@ class DeviceStatsMonitor(Callback):
4545
ModuleNotFoundError:
4646
If ``psutil`` is not installed and CPU stats are monitored.
4747
48+
Logged Metrics:
49+
Device statistics are logged with keys prefixed as
50+
``DeviceStatsMonitor.{hook_name}/{base_metric_name}`` (e.g.,
51+
``DeviceStatsMonitor.on_train_batch_start/cpu_percent``). The source of these
52+
metrics depends on the active :class:`~lightning.pytorch.accelerators.accelerator.Accelerator`
53+
and the ``cpu_stats`` flag.
54+
55+
CPU (via ``psutil``): Logs ``cpu_percent``, ``cpu_vm_percent``, ``cpu_swap_percent``.
56+
All are percentages (%).
57+
CUDA GPU (via :func:`torch.cuda.memory_stats`): Logs detailed memory statistics from
58+
PyTorch's allocator (e.g., ``allocated_bytes.all.current``, ``num_ooms``; all in Bytes).
59+
GPU compute utilization is not logged by default.
60+
Other Accelerators (e.g., TPU, MPS): Logs device-specific stats.
61+
- TPU example: ``avg. free memory (MB)``.
62+
- MPS example: ``mps.current_allocated_bytes``.
63+
Observe logs or check accelerator documentation for details.
64+
4865
Example::
4966
5067
from lightning import Trainer

0 commit comments

Comments
 (0)