Skip to content

Latest commit

 

History

History
198 lines (164 loc) · 4.16 KB

cuda.rst

File metadata and controls

198 lines (164 loc) · 4.16 KB

torch.cuda

.. automodule:: torch.cuda
.. currentmodule:: torch.cuda

.. autosummary::
    :toctree: generated
    :nosignatures:

    StreamContext
    can_device_access_peer
    current_blas_handle
    current_device
    current_stream
    cudart
    default_stream
    device
    device_count
    device_of
    get_arch_list
    get_device_capability
    get_device_name
    get_device_properties
    get_gencode_flags
    get_sync_debug_mode
    init
    ipc_collect
    is_available
    is_initialized
    memory_usage
    set_device
    set_stream
    set_sync_debug_mode
    stream
    synchronize
    utilization
    temperature
    power_draw
    clock_rate
    OutOfMemoryError

Random Number Generator

.. autosummary::
    :toctree: generated
    :nosignatures:

    get_rng_state
    get_rng_state_all
    set_rng_state
    set_rng_state_all
    manual_seed
    manual_seed_all
    seed
    seed_all
    initial_seed


Communication collectives

.. autosummary::
    :toctree: generated
    :nosignatures:

    comm.broadcast
    comm.broadcast_coalesced
    comm.reduce_add
    comm.scatter
    comm.gather

Streams and events

.. autosummary::
    :toctree: generated
    :nosignatures:

    Stream
    ExternalStream
    Event

Graphs (beta)

.. autosummary::
    :toctree: generated
    :nosignatures:

    is_current_stream_capturing
    graph_pool_handle
    CUDAGraph
    graph
    make_graphed_callables

Memory management

.. autosummary::
    :toctree: generated
    :nosignatures:

     empty_cache
     list_gpu_processes
     mem_get_info
     memory_stats
     memory_summary
     memory_snapshot
     memory_allocated
     max_memory_allocated
     reset_max_memory_allocated
     memory_reserved
     max_memory_reserved
     set_per_process_memory_fraction
     memory_cached
     max_memory_cached
     reset_max_memory_cached
     reset_peak_memory_stats
     caching_allocator_alloc
     caching_allocator_delete
     get_allocator_backend
     CUDAPluggableAllocator
     change_current_allocator
     MemPool
     MemPoolContext

.. autoclass:: torch.cuda.use_mem_pool

NVIDIA Tools Extension (NVTX)

.. autosummary::
    :toctree: generated
    :nosignatures:

    nvtx.mark
    nvtx.range_push
    nvtx.range_pop
    nvtx.range

Jiterator (beta)

.. autosummary::
    :toctree: generated
    :nosignatures:

    jiterator._create_jit_fn
    jiterator._create_multi_output_jit_fn

TunableOp

Some operations could be implemented using more than one library or more than one technique. For example, a GEMM could be implemented for CUDA or ROCm using either the cublas/cublasLt libraries or hipblas/hipblasLt libraries, respectively. How does one know which implementation is the fastest and should be chosen? That's what TunableOp provides. Certain operators have been implemented using multiple strategies as Tunable Operators. At runtime, all strategies are profiled and the fastest is selected for all subsequent operations.

See the :doc:`documentation <cuda.tunable>` for information on how to use it.

.. toctree::
    :hidden:

    cuda.tunable


Stream Sanitizer (prototype)

CUDA Sanitizer is a prototype tool for detecting synchronization errors between streams in PyTorch. See the :doc:`documentation <cuda._sanitizer>` for information on how to use it.

.. toctree::
    :hidden:

    cuda._sanitizer


.. py:module:: torch.cuda.comm
.. py:module:: torch.cuda.error
.. py:module:: torch.cuda.gds
.. py:module:: torch.cuda.graphs
.. py:module:: torch.cuda.jiterator
.. py:module:: torch.cuda.memory
.. py:module:: torch.cuda.nccl
.. py:module:: torch.cuda.nvtx
.. py:module:: torch.cuda.profiler
.. py:module:: torch.cuda.random
.. py:module:: torch.cuda.sparse
.. py:module:: torch.cuda.streams