Skip to content

Commit

Permalink
Initial commit of HIP environment variables
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Oct 30, 2024
1 parent f8c45b8 commit c2adf92
Show file tree
Hide file tree
Showing 8 changed files with 326 additions and 124 deletions.
1 change: 1 addition & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ FNUZ
fp
gedit
GPGPU
GROMACS
GWS
hardcoded
HC
Expand Down
3 changes: 2 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,5 +50,6 @@

exclude_patterns = [
"doxygen/mainpage.md",
"understand/glossary.md"
"understand/glossary.md",
'how-to/debugging_env.rst'
]
102 changes: 4 additions & 98 deletions docs/how-to/debugging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@
:description: How to debug using HIP.
:keywords: AMD, ROCm, HIP, debugging, ltrace, ROCgdb, WinGDB

.. _debugging_with_hip:

*************************************************************************
Debugging with HIP
*************************************************************************

AMD debugging tools include *ltrace* and *ROCgdb*. External tools are available and can be found
online. For example, if you're using Windows, you can use *Microsoft Visual Studio* and *WinGDB*.
HIP debugging tools include `ltrace <https://ltrace.org/>`_ and :doc:`ROCgdb <rocgdb:index>`. External tools are available and can be found online. For example, if you're using Windows, you can use Microsoft Visual Studio and WinGDB.

You can trace and debug your code using the following tools and techniques.

Expand Down Expand Up @@ -272,102 +273,7 @@ HIP environment variable summary

Here are some of the more commonly used environment variables:

.. <!-- spellcheck-disable -->
.. # COMMENT: The following lines define a break for use in the table below.
.. |break| raw:: html

<br />

.. <!-- spellcheck-enable -->
.. list-table::

* - **Environment variable**
- **Default value**
- **Usage**

* - AMD_LOG_LEVEL
|break| Enable HIP log on different Level
- 0
- 0: Disable log.
|break| 1: Enable log on error level
|break| 2: Enable log on warning and below levels
|break| 0x3: Enable log on information and below levels
|break| 0x4: Decode and display AQL packets

* - AMD_LOG_MASK
|break| Enable HIP log on different Level
- 0x7FFFFFFF
- 0x1: Log API calls
|break| 0x02: Kernel and Copy Commands and Barriers
|break| 0x4: Synchronization and waiting for commands to finish
|break| 0x8: Enable log on information and below levels
|break| 0x20: Queue commands and queue contents
|break| 0x40: Signal creation, allocation, pool
|break| 0x80: Locks and thread-safety code
|break| 0x100: Copy debug
|break| 0x200: Detailed copy debug
|break| 0x400: Resource allocation, performance-impacting events
|break| 0x800: Initialization and shutdown
|break| 0x1000: Misc debug, not yet classified
|break| 0x2000: Show raw bytes of AQL packet
|break| 0x4000: Show code creation debug
|break| 0x8000: More detailed command info, including barrier commands
|break| 0x10000: Log message location
|break| 0xFFFFFFFF: Log always even mask flag is zero

* - HIP_LAUNCH_BLOCKING
|break| Used for serialization on kernel execution.
- 0
- 0: Disable. Kernel executes normally.
|break| 1: Enable. Serializes kernel enqueue, behaves the same as AMD_SERIALIZE_KERNEL.

* - HIP_VISIBLE_DEVICES (or CUDA_VISIBLE_DEVICES)
|break| Only devices whose index is present in the sequence are visible to HIP
-
- 0,1,2: Depending on the number of devices on the system

* - GPU_DUMP_CODE_OBJECT
|break| Dump code object
- 0
- 0: Disable
|break| 1: Enable

* - AMD_SERIALIZE_KERNEL
|break| Serialize kernel enqueue
- 0
- 1: Wait for completion before enqueue
|break| 2: Wait for completion after enqueue
|break| 3: Both

* - AMD_SERIALIZE_COPY
|break| Serialize copies
- 0
- 1: Wait for completion before enqueue
|break| 2: Wait for completion after enqueue
|break| 3: Both

* - HIP_HOST_COHERENT
|break| Coherent memory in hipHostMalloc
- 0
- 0: memory is not coherent between host and GPU
|break| 1: memory is coherent with host

* - AMD_DIRECT_DISPATCH
|break| Enable direct kernel dispatch (Currently for Linux; under development for Windows)
- 1
- 0: Disable
|break| 1: Enable

* - GPU_MAX_HW_QUEUES
|break| The maximum number of hardware queues allocated per device
- 4
- The variable controls how many independent hardware queues HIP runtime can create per process,
per device. If an application allocates more HIP streams than this number, then HIP runtime reuses
the same hardware queues for the new streams in a round-robin manner. Note that this maximum
number does not apply to hardware queues that are created for CU-masked HIP streams, or
cooperative queues for HIP Cooperative Groups (single queue per device).
.. include:: ../how-to/debugging_env.rst

General debugging tips
======================================================
Expand Down
95 changes: 95 additions & 0 deletions docs/how-to/debugging_env.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
.. list-table::
:header-rows: 1
:widths: 35,14,51

* - **Environment variable**
- **Default value**
- **Value**

* - | ``AMD_LOG_LEVEL``
| Enables HIP log on various level.
- ``0``
- | 0: Disable log.
| 1: Enables error logs.
| 2: Enables warning logs next to lower-level logs.
| 3: Enables information logs next to lower-level logs.
| 4: Enables debug logs next to lower-level logs.
| 5: Enables debug extra logs next to lower-level logs.
* - | ``AMD_LOG_LEVEL_FILE``
| Sets output file for ``AMD_LOG_LEVEL``.
- stderr output
-

* - | ``AMD_LOG_MASK``
| Specifies HIP log filters. Here is the ` complete list of log masks <https://github.com/ROCm/clr/blob/develop/rocclr/utils/debug.hpp#L40>`_.
- ``0x7FFFFFFF``
- | 0x1: Log API calls.
| 0x2: Kernel and copy commands and barriers.
| 0x4: Synchronization and waiting for commands to finish.
| 0x8: Decode and display AQL packets.
| 0x10: Queue commands and queue contents.
| 0x20: Signal creation, allocation, pool.
| 0x40: Locks and thread-safety code.
| 0x80: Kernel creations and arguments, etc.
| 0x100: Copy debug.
| 0x200: Detailed copy debug.
| 0x400: Resource allocation, performance-impacting events.
| 0x800: Initialization and shutdown.
| 0x1000: Misc debug, not yet classified.
| 0x2000: Show raw bytes of AQL packet.
| 0x4000: Show code creation debug.
| 0x8000: More detailed command info, including barrier commands.
| 0x10000: Log message location.
| 0x20000: Memory allocation.
| 0x40000: Memory pool allocation, including memory in graphs.
| 0x80000: Timestamp details.
| 0xFFFFFFFF: Log always even mask flag is zero.
* - | ``HIP_LAUNCH_BLOCKING``
| Used for serialization on kernel execution.
- ``0``
- | 0: Disable. Kernel executes normally.
| 1: Enable. Serializes kernel enqueue, behaves the same as ``AMD_SERIALIZE_KERNEL``.
* - | ``HIP_VISIBLE_DEVICES`` (or ``CUDA_VISIBLE_DEVICES``)
| Only devices whose index is present in the sequence are visible to HIP
- Unset by default.
- 0,1,2: Depending on the number of devices on the system.

* - | ``GPU_DUMP_CODE_OBJECT``
| Dump code object.
- ``0``
- | 0: Disable
| 1: Enable
* - | ``AMD_SERIALIZE_KERNEL``
| Serialize kernel enqueue.
- ``0``
- | 0: Disable
| 1: Wait for completion before enqueue.
| 2: Wait for completion after enqueue.
| 3: Both
* - | ``AMD_SERIALIZE_COPY``
| Serialize copies
- ``0``
- | 0: Disable
| 1: Wait for completion before enqueue.
| 2: Wait for completion after enqueue.
| 3: Both
* - | ``AMD_DIRECT_DISPATCH``
| Enable direct kernel dispatch (Currently for Linux; under development for Windows).
- ``1``
- | 0: Disable
| 1: Enable
* - | ``GPU_MAX_HW_QUEUES``
| The maximum number of hardware queues allocated per device.
- ``4``
- The variable controls how many independent hardware queues HIP runtime can create per process,
per device. If an application allocates more HIP streams than this number, then HIP runtime reuses
the same hardware queues for the new streams in a round-robin manner. Note that this maximum
number does not apply to hardware queues that are created for CU-masked HIP streams, or
cooperative queues for HIP Cooperative Groups (single queue per device).
52 changes: 27 additions & 25 deletions docs/how-to/logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,12 @@ The value of this variable controls your logging level. Levels are defined as fo
.. code-block:: cpp
enum LogLevel {
LOG_NONE = 0,
LOG_ERROR = 1,
LOG_WARNING = 2,
LOG_INFO = 3,
LOG_DEBUG = 4
LOG_NONE = 0,
LOG_ERROR = 1,
LOG_WARNING = 2,
LOG_INFO = 3,
LOG_DEBUG = 4,
LOG_EXTRA_DEBUG = 5
};
.. tip::
Expand All @@ -55,26 +56,27 @@ change this to any of the valid values:
.. code-block:: cpp
enum LogMask {
LOG_API = 0x00000001, //!< API call
LOG_CMD = 0x00000002, //!< Kernel and Copy Commands and Barriers
LOG_WAIT = 0x00000004, //!< Synchronization and waiting for commands to finish
LOG_AQL = 0x00000008, //!< Decode and display AQL packets
LOG_QUEUE = 0x00000010, //!< Queue commands and queue contents
LOG_SIG = 0x00000020, //!< Signal creation, allocation, pool
LOG_LOCK = 0x00000040, //!< Locks and thread-safety code.
LOG_KERN = 0x00000080, //!< kernel creations and arguments, etc.
LOG_COPY = 0x00000100, //!< Copy debug
LOG_COPY2 = 0x00000200, //!< Detailed copy debug
LOG_RESOURCE = 0x00000400, //!< Resource allocation, performance-impacting events.
LOG_INIT = 0x00000800, //!< Initialization and shutdown
LOG_MISC = 0x00001000, //!< misc debug, not yet classified
LOG_AQL2 = 0x00002000, //!< Show raw bytes of AQL packet
LOG_CODE = 0x00004000, //!< Show code creation debug
LOG_CMD2 = 0x00008000, //!< More detailed command info, including barrier commands
LOG_LOCATION = 0x00010000, //!< Log message location
LOG_MEM = 0x00020000, //!< Memory allocation
LOG_MEM_POOL = 0x00040000, //!< Memory pool allocation, including memory in graphs
LOG_ALWAYS = 0xFFFFFFFF, //!< Log always even mask flag is zero
LOG_API = 1, //!< (0x1) API call
LOG_CMD = 2, //!< (0x2) Kernel and Copy Commands and Barriers
LOG_WAIT = 4, //!< (0x4) Synchronization and waiting for commands to finish
LOG_AQL = 8, //!< (0x8) Decode and display AQL packets
LOG_QUEUE = 16, //!< (0x10) Queue commands and queue contents
LOG_SIG = 32, //!< (0x20) Signal creation, allocation, pool
LOG_LOCK = 64, //!< (0x40) Locks and thread-safety code.
LOG_KERN = 128, //!< (0x80) Kernel creations and arguments, etc.
LOG_COPY = 256, //!< (0x100) Copy debug
LOG_COPY2 = 512, //!< (0x200) Detailed copy debug
LOG_RESOURCE = 1024, //!< (0x400) Resource allocation, performance-impacting events.
LOG_INIT = 2048, //!< (0x800) Initialization and shutdown
LOG_MISC = 4096, //!< (0x1000) Misc debug, not yet classified
LOG_AQL2 = 8192, //!< (0x2000) Show raw bytes of AQL packet
LOG_CODE = 16384, //!< (0x4000) Show code creation debug
LOG_CMD2 = 32768, //!< (0x8000) More detailed command info, including barrier commands
LOG_LOCATION = 65536, //!< (0x10000) Log message location
LOG_MEM = 131072, //!< (0x20000) Memory allocation
LOG_MEM_POOL = 262144, //!< (0x40000) Memory pool allocation, including memory in graphs
LOG_TS = 524288, //!< (0x80000) Timestamp details
LOG_ALWAYS = -1 //!< (0xFFFFFFFF) Log always even mask flag is zero
};
You can also define the logging mask via the ``AMD_LOG_MASK`` environment variable.
Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ The HIP documentation is organized into the following categories:
* [C++ language extensions](./reference/cpp_language_extensions)
* [C++ language support](./reference/cpp_language_support)
* [HIP math API](./reference/math_api)
* [HIP environment variables](./reference/env_variables)
* [Comparing syntax for different APIs](./reference/terms)
* [List of deprecated APIs](./reference/deprecated_api_list)
* [FP8 numbers in HIP](./reference/fp8_numbers)
Expand Down
Loading

0 comments on commit c2adf92

Please sign in to comment.