[Question] How does caching work in CUDA?

How does caching work for a simple kernel such as adding two vectors?
On arrayfire-rust 3.7.2 CUDA backend.


```
let dims = arrayfire::Dim4::new(&[4,4,1,1]);
let a = arrayfire::randu::<f32>(dims);
let mut b = arrayfire::randu::<f32>(dims);

let mut c = a.clone();
while (1==1)
{
	b = b + (0.02f32);
	c = arrayfire::add(&b, &a, false);
}
```
Running the code generates 100 cubins in ~/.arrayfire/.

How come arrayfire generates many different kernels just for adding two vectors?
Why in the first 100 iterations, the code runs much slower?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] How does caching work in CUDA? #262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] How does caching work in CUDA? #262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions