Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 1.21 KB

File metadata and controls

42 lines (26 loc) · 1.21 KB

Example 1

This Example demonstrates a simple vector multiple, with 3 separate device-memory backed tensors A, B, and C, where C = A .* B

Building

The run.sh script in the examples folder provides an environment capable of building all the examples. I would suggest running from this environment, however any CUDA-enabled environment should work.

Use the following commands to build the example:

mkdir build

cd build

cmake ..

make -j

Running

Run the example with the command ./example_1

You should see the following output:

I have no name!@72063c2be218:/scratch/projects/gpuStarterResources/examples/example_1/build$ make -j
Consolidate compiler generated dependencies of target example_1
[ 50%] Building CUDA object CMakeFiles/example_1.dir/example_1.cu.o
[100%] Linking CUDA executable example_1
[100%] Built target example_1
I have no name!@72063c2be218:/scratch/projects/gpuStarterResources/examples/example_1/build$ ./example_1 
Average elapsed time per iteration is: 3372.13us

Profiling Commands

to profile with nsys, try the following: nsys profile -o example_1 ./example_1

to profile with ncu, try the following: ncu --set=full --import-source=true -c 2 -f -o example_1 ./example_1