Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: No GPU Hardware Counters Available ROCm 6.1.2 #348

Closed
aymane-eljerari opened this issue Jun 15, 2024 · 2 comments
Closed

[Issue]: No GPU Hardware Counters Available ROCm 6.1.2 #348

aymane-eljerari opened this issue Jun 15, 2024 · 2 comments

Comments

@aymane-eljerari
Copy link

Problem Description

I am using Omnitrace on a Docker container based on the rocm/dev-ubuntu-22.04:6.1.2-complete Docker image. I am not able to view the GPU hardware counter when running:

$ omnitrace-avail -c GPU
|-----------------------|--------|-------------|
| ENVIRONMENT VARIABLE  | VALUE  | CATEGORIES  |
|-----------------------|--------|-------------|
|-----------------------|--------|-------------|

Operating System

Ubuntu 22.04.4 LTS (Jammy Jellyfish)

CPU

AMD EPYC 7H12 64-Core Processor

GPU

AMD Instinct MI100

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

$ rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  Node  IDs              Temp    Power  Partitions          SCLK    MCLK     Fan  Perf  PwrCap  VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)  (Mem, Compute, ID)                                                   
====================================================================================================================
0       3     0x738c,   33560  31.0°C  34.0W  N/A, N/A, 0         300Mhz  1200Mhz  0%   auto  290.0W  0%     0%    
1       2     0x738c,   19561  32.0°C  35.0W  N/A, N/A, 0         300Mhz  1200Mhz  0%   auto  290.0W  0%     0%    
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

Both HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES are set to 0,1.

@jrmadsen
Copy link
Collaborator

Did you build Omnitrace from scratch? Because we haven't released a build supporting ROCm 6.1 yet (see #349)

@aymane-eljerari
Copy link
Author

I believe the issue was most likely caused by the missing flags in the docker container. --group-add=video --cap-add SYS_PTRACE. The hardware counters are now available when using rocprof so they should show up on omnitrace as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants