-
Notifications
You must be signed in to change notification settings - Fork 3.5k
DOC: Clarify DeviceStatsMonitor logged metrics #20895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
DOC: Clarify DeviceStatsMonitor logged metrics #20895
Conversation
@@ -45,6 +45,23 @@ class DeviceStatsMonitor(Callback): | |||
ModuleNotFoundError: | |||
If ``psutil`` is not installed and CPU stats are monitored. | |||
|
|||
Logged Metrics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raises:
or Args:
are Sphinx-specific keywords compared to Logged Metrics:
, so pls let's move it just to the top of this docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @Borda! I've moved the 'Logged Metrics' section to the top of the docstring as requested in commit dcd1042.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I still see it without change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cf7e36a
to
798f9c9
Compare
798f9c9
to
2461f52
Compare
for more information, see https://pre-commit.ci
CPU (via ``psutil``): Logs ``cpu_percent``, ``cpu_vm_percent``, ``cpu_swap_percent``. | ||
All are percentages (%). | ||
CUDA GPU (via :func:`torch.cuda.memory_stats`): Logs detailed memory statistics from | ||
PyTorch's allocator (e.g., ``allocated_bytes.all.current``, ``num_ooms``; all in Bytes). | ||
GPU compute utilization is not logged by default. | ||
Other Accelerators (e.g., TPU, MPS): Logs device-specific stats: | ||
|
||
- TPU example: ``avg. free memory (MB)``. | ||
- MPS example: ``mps.current_allocated_bytes``. | ||
|
||
Observe logs or check accelerator documentation for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'Let''s make this a complete list and you can validate the compiled docs in readthedocs link 📚: pytorch-lightning--20895.org.readthedocs.build/en/20895
What does this PR do?
This PR addresses issue #20807 by adding detailed documentation for the metrics logged by
DeviceStatsMonitor
.The key clarifications include:
psutil
, CUDA GPU viatorch.cuda.memory_stats
, and other accelerators viaaccelerator.get_device_stats()
).DeviceStatsMonitor.{hook_name}/{base_metric_name}
.torch.cuda.memory_stats()
for the full list of memory metrics.profiler_basic.rst
to align with these clarifications and link to the API docs.This documentation aims to help users understand what statistics to expect when using
DeviceStatsMonitor
with different hardware configurations.Fixes #20807
No breaking changes are introduced by this documentation update.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--20895.org.readthedocs.build/en/20895/