⚠️ Note: AMD GPUs do not natively support CUDA, which is proprietary to NVIDIA.
This repository emulates a CUDA-like development environment for AMD GPUs by leveraging AMD’s ROCm platform, HIP (Heterogeneous-compute Interface for Portability), and CI pipelines to enable high-performance GPU compute across RDNA/CDNA hardware.
CUDA is NVIDIA-specific. It tightly couples hardware and software via proprietary drivers, compilers, and a runtime stack. AMD GPUs do not support CUDA natively. But if you're on Team Red (AMD), you’re not left behind—thanks to HIP and ROCm.

- NVIDIA uses:
CUDA
,cuDNN
,nvcc
,libcudart
, andPTX kernels
- AMD uses:
HIP
,ROCm
,hipcc
,ROCr runtime
, andHSACO kernels
Layer | Description |
---|---|
HIP API | CUDA-like C++ API for portable kernel code |
ROCr Runtime | Handles kernel dispatch, memory management, and GPU command queues |
ROCm Compiler | HIP-Clang frontend; supports SASS-level kernel compilation |
HSA (Heterogeneous System Architecture) | Foundation for communication between CPU and GPU |
RDNA/CDNA ISA | Instruction set architecture for discrete and data-center AMD GPUs |
Azure DevOps Pipelines | CI system to orchestrate and scale multi-GPU build/test processes |
VMSS with Docker | Enables scalable containerized GPU workloads for HIP build/test |
- Write once, run on both NVIDIA and AMD
- Familiar to CUDA devs
- Supported by PyTorch, TensorFlow via AMD extensions
HIP (Heterogeneous-Compute Interface for Portability) is AMD’s CUDA-to-AMD translation layer. It allows CUDA-like syntax with hipcc
to compile and run on both AMD and NVIDIA.
// HIP example kernel
__global__ void addKernel(float* c, const float* a, const float* b) {
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
Each ROCm component repo links to this central YAML orchestrator via rocm-ci.yml
.
AMD kernel compilation flow:
- Source (
.hip
or.cpp
) → LLVM IR - LLVM IR → HSA Intermediate Language (HSAIL)
- HSAIL → HSACO (HSA Code Object)
- Executed on GCN / RDNA via ROCr runtime

Think of HSACO like NVIDIA’s PTX, but for AMD hardware.
AMD compiler backends (LLVM + HCC) convert kernels into Static Single Assignment (SSSA) form for:
- Aggressive loop tiling
- SIMD vectorization
- Wavefront-level parallelism (similar to warps in CUDA)
Framework | AMD ROCm / HIP Support |
---|---|
PyTorch | ✅ via ROCm builds |
TensorFlow | ✅ via tensorflow-rocm |
ONNX Runtime | ✅ |
Blender Cycles | ✅ (OpenCL backend) |
NumPy/SciPy |
hipcc
– HIP compiler (alias ofclang
)rocminfo
– Device capabilities viewerclinfo
– OpenCL device inspectionrocm-smi
– AMD GPU status tool (temp, power, clocks)hipify-perl
– Convert.cu
files to.hip
- ROCm now supports AI super-scaling (similar to Tensor Cores) via
Matrix Cores
in MI300/Instinct series. - Upcoming OpenCL 3.0 compliance
- Potential Vulkan Compute ↔ HIP interop for hybrid render/compute
This project is designed to work with AMD’s modern GPU compute pipeline, including:
- Kernel Offloading: HIP kernels are offloaded to dGPU or APU over PCIe/Infinity Fabric via ROCr
- Wavefront Scheduler: SIMD32-based warp schedulers for controlling parallelism
- SSSA Core Tuning: Runtime toggling of shared memory, VGPR/SGPR registers
- Multi-GPU Strategy: ROCm-aware build agents can parallelize workloads across GPU groups
- LLDB & ROCgdb Debugging: Attach low-level debuggers in CI to trace kernel failures
- ROCm Profiler Support: CI-integrated hooks for collecting kernel execution metrics

Azure VMSS instances provisioned for ROCm builds should include Docker and be ROCm-aware. Use the following cloud-init
config in the Custom Data field during VM creation:
#cloud-config
bootcmd:
- mkdir -p /etc/systemd/system/walinuxagent.service.d
- echo "[Unit]\nAfter=cloud-final.service" > /etc/systemd/system/walinuxagent.service.d/override.conf
- sed "s/After=multi-user.target//g" /lib/systemd/system/cloud-final.service > /etc/systemd/system/cloud-final.service
- systemctl daemon-reload
apt:
sources:
docker.list:
source: deb [arch=amd64] https://download.docker.com/linux/ubuntu $RELEASE stable
keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
packages:
- docker-ce
- docker-ce-cli
- libnuma-dev
groups:
- docker
runcmd:
- usermod -aG docker $USER
- systemctl restart docker
- systemctl enable docker
-
Dynamic Buffer Kernel Dispatch using HIP Streams and Signal Queues
-
Vulkan Interop pipeline for ROCm + Vulkan kernel rendering on RDNA
-
Automatic HIPIFY scanning of CUDA code in Pull Requests
-
SSBO + Async Compute Support for mixed graphics/GPU compute CI tests
-
LLVM Pass Hooks for auto-injecting metrics during device compile
-
HIP Programming Model
-
Azure Pipelines YAML Docs
-
LLVM for HIP
-
AMD GPU ISA
-
OpenMP Offloading to AMD
This pipeline and its configuration are provided for open experimentation and CI testing of HIP/ROCm workloads. Usage across different GPU architectures (e.g., RDNA2 vs CDNA2) may require manual configuration adjustments.
This repository and its contributors make no guarantees of performance, correctness, or long-term compatibility. ROCm stack is under active development and may break at any time with upstream changes.
-
AMD ROCm Ecosystem (Thanks)
-
Open Source Contributors (Yess)
-
ChipSet By: AMD corporation (OG)
-
Maintained by: Inbora Studio (Me)