Skip to content

[Question] How does caching work in CUDA? #262

Closed
@BA8F0D39

Description

@BA8F0D39

How does caching work for a simple kernel such as adding two vectors?
On arrayfire-rust 3.7.2 CUDA backend.

let dims = arrayfire::Dim4::new(&[4,4,1,1]);
let a = arrayfire::randu::<f32>(dims);
let mut b = arrayfire::randu::<f32>(dims);

let mut c = a.clone();
while (1==1)
{
	b = b + (0.02f32);
	c = arrayfire::add(&b, &a, false);
}

Running the code generates 100 cubins in ~/.arrayfire/.

How come arrayfire generates many different kernels just for adding two vectors?
Why in the first 100 iterations, the code runs much slower?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions