Skip to content

v0.4.0

Latest
Compare
Choose a tag to compare
@nathanielsimard nathanielsimard released this 14 Jan 20:36
· 15 commits to main since this release

Matrix Multiplication (Matmul) Improvements:

Refactored configuration for better kernel selection and performance tuning. Added support for batch operations, double buffering, and pipelined processing to enhance throughput and efficiency. Implemented customizable dispatch for non-square matrices and introduced heuristics for kernel selection.

New Crate for Reduce Kernels

This release introduces a new crate (cubecl-reduce) that contains optimized reduce kernels working on all platforms.

Compiler and Runtime Optimizations:

Refactored SPIR-V and HIP compilers with support for new features like WMMA intrinsics and improved debug information. Enhanced WebGPU support with better sync mechanisms and hardware property queries. Added support for compile-time constants and improved code generation for various architectures.

New Functionalities:

Added support for more instructions and better type support.

Bug Fixes

Fixed various issues with autotuning, particularly for WASM and CUDA environments.
Resolved visibility issues with implementation functions in macros. Addressed multiple synchronization and compilation bugs across different runtime environments. Corrected handling of specific data types and operations in SPIR-V, WGSL, and CUDA.

Refactoring

Significant refactoring of the IR (Intermediate Representation) for cleaner, more maintainable code.
Streamlined autotune processes and simplified the optimizer for better extensibility.
Updated and cleaned up the codebase to align with newer versions of Rust and its ecosystem.

Documentation & User Experience

Enhanced error messages, particularly for matrix operations, providing clearer feedback on issues. Added documentation to support users in understanding new features and configurations. Implemented user hints for deriving traits and using extensions in kernel functions.

Full Changelog: v0.3.0...v0.4.0