Skip to content

Commit

Permalink
saxpy WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
MathiasMagnus committed Aug 22, 2023
1 parent b8f6776 commit 3e02a2e
Showing 1 changed file with 225 additions and 32 deletions.
257 changes: 225 additions & 32 deletions docs/tutorials/saxpy.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,8 @@ for compilation" on Linux, just to make invocations more terse let's do it on
both Linux and Windows.

::::{tab-set}
:::{tab-item} Linux
:sync: linux
:::{tab-item} Linux & AMD
:sync: linux-amd

While distro maintainers may package ROCm such that they install to
system-default locations, AMD's installation don't and need to be added to the
Expand All @@ -192,22 +192,44 @@ You should be able to call the compiler on the command-line now:
amdclang++ --version
```

```{tip}
Docker images distributed by AMD, such as
[`rocm-terminal`](https://hub.docker.com/r/rocm/rocm-terminal/) already have
`/opt/rocm/bin` on the Path for convenience. (This subtly affects CMake package
detection logic of ROCm libraries.)
```

:::
:::{tab-item} Windows
:sync: windows
:::{tab-item} Linux & NVIDIA
:sync: linux-nvidia

Both distro maintainers and NVIDIA package CUDA as such that `nvcc` and related
tools are on the command-line by default. You should be able to call the
compiler on the command-line simply:

```bash
nvcc --version
```

:::
:::{tab-item} Windows & AMD
:sync: windows-amd

Windows compilers and command-line tooling have traditionally
relied on extra environemntal variables and Path entries to function correctly.
relied on extra environmental variables and Path entries to function correctly.
Visual Studio refers to command-lines with these setup as "Developer
Command-Line". The HIP SDK on Windows doesn't ship a complete toolchain, you
will also need:
Command Prompt" or "Developer PowerShell" for `cmd.exe` and PowerShell
respectively.

The HIP SDK on Windows doesn't ship a complete toolchain, you will also need:

- the Windows SDK, most crucially providing the import libs to crucial system
libraries all executables must link to and some auxiliary compiler tooling.
- a Standard Template Library, aka. STL, which HIP too relies on. The prior may
be installed separately, though it's most conveniently obtained through the
Visual Studio installer, while the latter is part of the Microsoft Visual C++
compiler, aka. MSVC.
- a Standard Template Library, aka. STL, which HIP too relies on.

The prior may be installed separately, though it's most conveniently obtained
through the Visual Studio installer, while the latter is part of the Microsoft
Visual C++ compiler, aka. MSVC, also installed via Visual Studio.

If you don't already have some SKU of Visual Studio 2022 installed, for a
minimal command-line experience, install the
Expand All @@ -222,17 +244,18 @@ with the Desktop Developemnt Workload and under Individual Components select:
The "C++ CMake tools for Windows" individual component is a convenience which
puts both `cmake.exe` and `ninja.exe` onto the `PATH` inside developer
command-prompts. You can install these manually, but then you need to manage
these tools manually.
them manually.
```

The first Visual Studio 2022 SKU install location is in an environmental
variable `VS2022INSTALLDIR`. To setup a command-line with this compiler's STL
set in the `INCLUDE` env var, issue from PowerShell:
Visual Studio installations as of VS 2017 are detectable as COM object
instances via WMI. To setup a command-line from any shell for the latest
Visual Studio's default (latest) Visual C++ toolset issue:

```pwsh
Import-Module $env:VS2022INSTALLDIR\Common7\Tools\Microsoft.VisualStudio.DevShell.dll
Enter-VsDevShell -InstallPath $env:VS2022INSTALLDIR -SkipAutomaticLocation -DevCmdArguments "-arch=x64 -host_arch=x64 -no_logo"
$env:PATH += ";${env:HIP_PATH}bin"
$InstallationPath = Get-CimInstance MSFT_VSInstance | Sort-Object -Property Version -Descending | Select-Object -First 1 -ExpandProperty InstallLocation
Import-Module $InstallationPath\Common7\Tools\Microsoft.VisualStudio.DevShell.dll
Enter-VsDevShell -InstallPath $InstallationPath -SkipAutomaticLocation -Arch amd64 -HostArch amd64 -DevCmdArguments '-no_logo'
$env:PATH = "${env:HIP_PATH}bin;${env:PATH}"
```

You should be able to call the compiler on the command-line now:
Expand All @@ -241,30 +264,99 @@ You should be able to call the compiler on the command-line now:
clang++ --version
```

### Invoking the Compiler Manually
:::
:::{tab-item} Windows & NVIDIA
:sync: windows-nvidia

Windows compilers and command-line tooling have traditionally
relied on extra environmental variables and Path entries to function correctly.
Visual Studio refers to command-lines with these setup as "Developer
Command Prompt" or "Developer PowerShell" for `cmd.exe` and PowerShell
respectively.

The HIP and CUDA SDKs on Windows doesn't ship complete toolchains, you will
also need:

- the Windows SDK, most crucially providing the import libs to crucial system
libraries all executables must link to and some auxiliary compiler tooling.
- a Standard Template Library, aka. STL, which HIP too relies on.

The prior may be installed separately, though it's most conveniently obtained
through the Visual Studio installer, while the latter is part of the Microsoft
Visual C++ compiler, aka. MSVC, also installed via Visual Studio.

If you don't already have some SKU of Visual Studio 2022 installed, for a
minimal command-line experience, install the
[Build Tools for Visual Studio 2022](https://aka.ms/vs/17/release/vs_BuildTools.exe)
with the Desktop Developemnt Workload and under Individual Components select:

- some version of the Windows SDK
- "MSVC v143 - VS 2022 C++ x64/x86 build tools (Latest)"
- "C++ CMake tools for Windows" (optional)

```{tip}
The "C++ CMake tools for Windows" individual component is a convenience which
puts both `cmake.exe` and `ninja.exe` onto the `PATH` inside developer
command-prompts. You can install these manually, but then you need to manage
them manually.
```

Visual Studio installations as of VS 2017 are detectable as COM object
instances via WMI. To setup a command-line from any shell for the latest
Visual Studio's default (latest) Visual C++ toolset issue:

```pwsh
$InstallationPath = Get-CimInstance MSFT_VSInstance | Sort-Object -Property Version -Descending | Select-Object -First 1 -ExpandProperty InstallLocation
Import-Module $InstallationPath\Common7\Tools\Microsoft.VisualStudio.DevShell.dll
Enter-VsDevShell -InstallPath $InstallationPath -SkipAutomaticLocation -Arch amd64 -HostArch amd64 -DevCmdArguments '-no_logo'
```

You should be able to call the compiler on the command-line now:

```pwsh
nvcc --version
```

:::
::::

### Invoking the Compiler Manually

To compile and link a single-file application, one may use the following
command:

::::{tab-set}
:::{tab-item} Linux
:sync: linux
:::{tab-item} Linux & AMD
:sync: linux-amd

```bash
amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2
```

:::
:::{tab-item} Windows
:sync: windows
:::{tab-item} Linux & NVIDIA
:sync: linux-nvidia

```bash
nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu
```

:::
:::{tab-item} Windows & AMD
:sync: windows-amd

```pwsh
clang++ .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I .\Common -lamdhip64 -L ${env:HIP_PATH}lib -O2
```

:::
:::{tab-item} Windows & NVIDIA
:sync: windows-nvidia

```pwsh
nvcc .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I ${env:HIP_PATH}include -I .\Common -O2 -x cu
```

:::
::::

Expand All @@ -280,8 +372,8 @@ was arguing about not finding the right binary to dispatch for execution. How
can one find out what device binary flavors are embedded into the executable?

::::{tab-set}
:::{tab-item} Linux
:sync: linux
:::{tab-item} Linux & AMD
:sync: linux-amd

The set of `roc-*` utilities shipping with ROCm help significantly to inspect
binary artifacts on disk. If you wish to use these utilities, add the ROCmCC
Expand Down Expand Up @@ -373,8 +465,33 @@ the filename directly informs us of the graphics IPs used by the compiler. The
contents of this file is very similar to what `roc-obj` printed to the console.

:::
:::{tab-item} Windows
:sync: windows
:::{tab-item} Linux & NVIDIA
:sync: linux-nvidia

Unlike HIP on AMD, when compiling using the NVIDIA support of HIP the resulting
binary will be a valid CUDA executable as far as the binary goes. Therefor
it'll incorporate PTX ISA (Parallel Thread eXecution Instruction Set
Architecture) instead of AMDGPU binary. As s result, tooling shipping with the
CUDA SDK can be used to inspect which device ISA got compiled into a specific
executable. The tool most useful to us currently is `cuobjdump`.

```bash
cuobjdump --list-ptx ./saxpy
```

Which will print something like:

```none
PTX file 1: saxpy.1.sm_52.ptx
```

From this we can see that the saxpy kernel is stored as `sm_52`, which shows
that a compute capability 5.2 ISA got embedded into the executable, so devices
which sport compute capability 5.2 or newer will be able to run this code.

:::
:::{tab-item} Windows & AMD
:sync: windows-amd

The HIP SDK for Windows don't yet sport the `roc-*` set of utilities to work
with binary artifacts. To find out what binary formats are embedded into an
Expand Down Expand Up @@ -480,15 +597,40 @@ _Z12saxpy_kernelfPKfPfj: ; @_Z12saxpy_kernelfPKfPfj
...
```

:::
:::{tab-item} Windows & NVIDIA
:sync: windows-nvidia

Unlike HIP on AMD, when compiling using the NVIDIA support of HIP the resulting
binary will be a valid CUDA executable as far as the binary goes. Therefor
it'll incorporate PTX ISA (Parallel Thread eXecution Instruction Set
Architecture) instead of AMDGPU binary. As s result, tooling shipping with the
CUDA SDK can be used to inspect which device ISA got compiled into a specific
executable. The tool most useful to us currently is `cuobjdump`.

```bash
cuobjdump.exe --list-ptx .\saxpy.exe
```

Which will print something like:

```none
PTX file 1: saxpy.1.sm_52.ptx
```

From this we can see that the saxpy kernel is stored as `sm_52`, which shows
that a compute capability 5.2 ISA got embedded into the executable, so devices
which sport compute capability 5.2 or newer will be able to run this code.

:::
::::

Now that we've found what binary got embedded into the executable, we only need
to find which format our available devices use.

::::{tab-set}
:::{tab-item} Linux
:sync: linux
:::{tab-item} Linux & AMD
:sync: linux-amd

On Linux a utility called `rocminfo` can help us list all the properties of the
devices available on the system, including which version of graphics IP
Expand All @@ -505,20 +647,71 @@ features are after the graphics IP. Until further notice we'll treat them as
part of the binary version.)_

:::
:::{tab-item} Windows
:sync: windows
:::{tab-item} Linux & NVIDIA
:sync: linux-nvidia

On Linux HIP with the NVIDIA back-end a CUDA SDK sample called `deviceQuery`
can help us list all the properties of the devices available on the system,
including which version of compute capability a device sports.
(`<major>.<minor>` compute capability is passed to `nvcc` on the
command-line as `sm_<major><minor>`, for eg. `8.6` is `sm_86`.)

Because it's not shipped as a binary, we may as well compile the matching
example from ROCm.

```bash
nvcc ./HIP-Basic/device_query/main.cpp -o device_query -I ./Common -I /opt/rocm/include -O2
```

We'll filter the output to have only the lines of interest, for eg.:

```bash
./device_query | grep "major.minor"
major.minor: 8.6
major.minor: 7.0
```

:::
:::{tab-item} Windows & AMD
:sync: windows-amd

On Windows a utility called `hipInfo.exe` can help us list all the properties
of the devices available on the system, including which version of graphics IP
(`gfxXYZ`) they employ. We'll filter the output to have only these lines:

```pwsh
& ${env:HIP_PATH}bin\hipInfo.exe | sls gfx
& ${env:HIP_PATH}bin\hipInfo.exe | Select-String gfx
gcnArchName: gfx1032
gcnArchName: gfx1035
```

:::
:::{tab-item} Winodws & NVIDIA
:sync: windows-nvidia

On Windows HIP with the NVIDIA back-end a CUDA SDK sample called `deviceQuery`
can help us list all the properties of the devices available on the system,
including which version of compute capability a device sports.
(`<major>.<minor>` compute capability is passed to `nvcc` on the
command-line as `sm_<major><minor>`, for eg. `8.6` is `sm_86`.)

Because it's not shipped as a binary, we may as well compile the matching
example from ROCm.

```pwsh
nvcc .\HIP-Basic\device_query\main.cpp -o device_query.exe -I .\Common -I ${env:HIP_PATH}include -O2
```

We'll filter the output to have only the lines of interest, for eg.:

```pwsh
.\device_query.exe | Select-String "major.minor"
major.minor: 8.6
major.minor: 7.0
```

:::
::::

Expand Down

0 comments on commit 3e02a2e

Please sign in to comment.