Closed
Description
How does caching work for a simple kernel such as adding two vectors?
On arrayfire-rust 3.7.2 CUDA backend.
let dims = arrayfire::Dim4::new(&[4,4,1,1]);
let a = arrayfire::randu::<f32>(dims);
let mut b = arrayfire::randu::<f32>(dims);
let mut c = a.clone();
while (1==1)
{
b = b + (0.02f32);
c = arrayfire::add(&b, &a, false);
}
Running the code generates 100 cubins in ~/.arrayfire/.
How come arrayfire generates many different kernels just for adding two vectors?
Why in the first 100 iterations, the code runs much slower?