Releases: charles-r-earp/krnl
Releases · charles-r-earp/krnl
v0.1.1
- krnlc set
repository
in manifest (#34). Thanks @Foorack - krnlc and krnl versions no longer have to be the same, semver compatible versions are allowed (0b86d71).
- half version requirement relaxed from
=2.1.0
to2.1.0
(#38). - Update krnl-cache docs (259bafe).
krnl v0.1.1 supports caches created with krnlc v0.1.0, but krnlc v0.1.0 will reject other krnl versions. This means that krnlc will have to be updated to v0.1.1 compile kernels with krnl v0.1.1 or later. This update will not be required for subsequent semver compatible releases.
v0.1.0
krnl was developed to replace core functionality in autograph:
- Only targets Vulkan, more portable than Metal / DX12.
- Metal is supported via MoltenVK.
- GPGPU kernels implemented inline in Rust:
- Kernels can be defined in the same file, near where they are invoked.
- Modules allow sharing code between host and device.
- Kernel bindings are type safe, checked at compile time.
- Simple iterator patterns can be implemented without unsafe.
- Supports specialization constants provided at runtime.
- DeviceInfo includes useful properties:
- Max / default threads per group.
- Max / min threads per subgroup.
- With DebugPrintf, kernel panics produce errors on the host.
- krnlc generates a device crate and invokes spirv-builder.
- spirv-builder / spirv-tools are compiled once on install.
- Significantly streamlines and accelerates workflow.
- Kernels are compressed to reduce package and binary size.
- Device operations readily execute:
- Block until kernels / transfers can queue.
- An operation can be queued while another is executing.
- Reduced latency, better repeatability, reliability, and performance.
- Device buffers can be copied by the host if host visible.
- Large buffer copies are streamed rather than allocating a large temporary:
- Reuses a few small buffers for transfers.
- Overlaps host and device copies.
- Performance significantly closer to CUDA.
- Also streams between devices.
- Device buffers can be i32::MAX bytes (~2 GB, up from 256 MB).
- Scalar / ScalarBufferBase replaces Float / FloatBuffer:
- Streamlined conversions between buffers.
- Buffers can be sliced.
- Supports wasm (without device feature).
MSRV: 1.70.0