You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to have a way to perform boolean algebra operations on tensor elements, as well as bit counting operations, like count_ones.
Feature motivation
In my discrete AI research I am dealing with billions of binary embedding values that are represented by u128s or [u64; N]s. For certain operations I am required to do boolean algebra on them. One especially tricky operation is discrete correlation which requires bit counting of tensor elements.
The equivalent scalar code looks like this:
pubfncorrelation<T>(x:BitVector<T>,y:BitVector<T>) -> f32whereBitVector<T>:BitOps,{let nx = x.count_ones();let ny = y.count_ones();let s = (x & y).count_ones();let product = nx * ny;if product == 0{0.}else{
s asf32 / (product asf32).sqrt()}}
Even with rudimentary support for integer computation in consumer GPUs, I was able to achieve ~30x speedup when compared to CPU. Currently everything is implemented in custom shaders using Vulkan and working fine. But the code is definitely not composable and PITA to support. So I am looking for a better solution.
(Optional) Suggest a Solution
So, technically it is possible to implement and was already proven to be efficient. But I have no idea whether Burn or any of its backends would be suitable to do that sanely.
I assume it can be done by writing custom wgpu kernel, but that would probably not be generic enough. For my quite niche case (to say the least) this is probably the way to go, but I wanted to ask for your opinion first. Maybe someone else would also look for something as crazy.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Feature description
I would like to have a way to perform boolean algebra operations on tensor elements, as well as bit counting operations, like
count_ones
.Feature motivation
In my discrete AI research I am dealing with billions of binary embedding values that are represented by
u128
s or[u64; N]
s. For certain operations I am required to do boolean algebra on them. One especially tricky operation is discrete correlation which requires bit counting of tensor elements.The equivalent scalar code looks like this:
Even with rudimentary support for integer computation in consumer GPUs, I was able to achieve ~30x speedup when compared to CPU. Currently everything is implemented in custom shaders using Vulkan and working fine. But the code is definitely not composable and PITA to support. So I am looking for a better solution.
(Optional) Suggest a Solution
So, technically it is possible to implement and was already proven to be efficient. But I have no idea whether Burn or any of its backends would be suitable to do that sanely.
I assume it can be done by writing custom wgpu kernel, but that would probably not be generic enough. For my quite niche case (to say the least) this is probably the way to go, but I wanted to ask for your opinion first. Maybe someone else would also look for something as crazy.
Thanks in advance!
The text was updated successfully, but these errors were encountered: