diff --git a/docs_input/basics/sparse_tensor.rst b/docs_input/basics/sparse_tensor.rst new file mode 100644 index 00000000..45dd90a4 --- /dev/null +++ b/docs_input/basics/sparse_tensor.rst @@ -0,0 +1,257 @@ +Sparse Tensor Type +################## + +MatX is in the process of adding experimental support for sparse tensors. +The implementation is based on the **Universal Sparse Tensor (UST)** type +that uses a tensor format DSL (Domain Specific Language) to describe a vast +space of storage formats. Although the UST type can easily define many common +storage formats (such as dense vectors and matrices, sparse vectors, sparse +matrices in COO, CSR, CSC, DCSR, DCSC, or BSR format, with generalizations +for sparse tensors), it can also define many less common storage formats. +From MatX's perspective, the advantage of using the UST type (rather than +various specific sparse storage formats) is that the framework code only has +to deal with a single new sparse type (and only dispatch to specific formats +when required by a high performance library implementation). Also, the tensor +format DSL can be easily extended to include even more sparse storage formats +in the future. From the user's perspective, the UST type provides more +flexibility in changing storage formats by merely changing annotations in the +type definitions, which allows for rapid experimentation with different ways +of storing sparse tensors in a MatX computation. + +Quick Start +----------- + +Despite the forward looking design of using the UST type, the current +experimental support provides a few factory methods with the common +formats COO, CSR, and CSC. The factory methods look similar to e.g. +sparse construction methods found in SciPy sparse or torch sparse. + +For example, to create a COO representation of the following +4x8 matrix with 5 nonzero elements:: + + | 1, 2, 0, 0, 0, 0, 0, 0 | + A = | 0, 0, 0, 0, 0, 0, 0, 0 | + | 0, 0, 0, 0, 0, 0, 0, 0 | + | 0, 0, 3, 4, 0, 5, 0, 0 | + +First, using a uniform memory space, set up the constituent 1-dim buffers +that contain, respectively, the value, i-index, and j-index of each nonzero +element, ordered lexicographically by row-then-column index, as follows:: + + auto vals = make_tensor({5}); + auto idxi = make_tensor({5}); + auto idxj = make_tensor({5}); + vals.SetVals({1, 2, 3, 4, 5}); + idxi.SetVals({0, 0, 3, 3, 3}); + idxj.SetVals({0, 1, 2, 3, 5}); + +Then, the COO representation of ``A``, residing in the same memory space as +its constituent buffers is constructed as follows:: + + auto Acoo = experimental::make_tensor_coo(vals, idxi, idxj, {4, 8}); + + print(Acoo); + +The result of the print statement is shown below:: + + tensor_impl_2_f32: SparseTensor{float} Rank: 2, Sizes:[4, 8], Levels:[4, 8] + nse = 5 + format = ( d0, d1 ) -> ( d0 : compressed(non-unique), d1 : singleton ) + pos[0] = ( 0 5 ) + crd[0] = ( 0 0 3 3 3 ) + crd[1] = ( 0 1 2 3 5 ) + values = ( 1.0000e+00 2.0000e+00 3.0000e+00 4.0000e+00 5.0000e+00 ) + space = CUDA managed memory + +Note that, like dense tensors, sparse tensors provide ()-operations +for indexing. However, users should **never** use the ()-operator +in performance critical code, since sparse storage formats do not +provide O(1) random access to their elements (compressed levels will +use some form of search to determine if an element is present):: + + // Naive way to convert the sparse matrix back to a dense matrix. + auto A = make_tensor({4, 8}); + for (index_t i = 0; i < 4; i++) { + for (index_t j = 0; j < 8; j++) { + A(i, j) = Acoo(i, j); + } + } + +Instead, conversions (and other operations) should use sparse operations +that are specifically optimized for the sparse storage format. The +correct way of performing the conversion above is as follows:: + + auto A = make_tensor({4, 8}); + (A = sparse2dense(Acoo)).run(exec); + +The current experimental sparse support in MatX provides efficient +operations for sparse-to-dense, dense-to-sparse, matmul, and solve:: + + (A = sparse2dense(Acoo)).run(exec); + (Acoo = dense2sparse(D)).run(exec); + (C = matmul(Acoo, B)).run(exec); + (X = solve(Acsr, Y)).run(exec); // CSR only + +We expect the assortment of supported sparse operations and storage +formats to grow if the experimental implementation is well-received. + +Matx Sparse Tensor Factory Methods +---------------------------------- + +The MatX implementation of the factory methods for common cases of +the UST type can be found in the `make_sparse_tensor.h`_ file. +All methods build a sparse tensor storage format from constituent +1-dim buffers similar to methods found in SciPy or torch sparse. +A sample usage was already shown above. Currently only methods +to construct COO, CSR, and CSC are provided:: + + // Constructs a sparse matrix in COO format directly from the values and + // the two coordinates vectors. The entries should be sorted by row, then + // column. Duplicate entries should not occur. Explicit zeros may be stored. + template + auto make_tensor_coo(ValTensor &val, + CrdTensor &row, + CrdTensor &col, const index_t (&shape)[2]); + + // Constructs a sparse matrix in CSR format directly from the values, the + // row positions, and column coordinates vectors. The entries should be + // sorted by row, then column. Explicit zeros may be stored. Duplicate + // entries should not occur. Explicit zeros may be stored. + template + auto make_tensor_csr(ValTensor &val, + PosTensor &rowp, + CrdTensor &col, const index_t (&shape)[2]); + + // Constructs a sparse matrix in CSC format directly from the values, the + // column positions, and row coordinates vectors. The entries should be + // sorted by columns, then row. Explicit zeros may be stored. Duplicate + // entries should not occur. Explicit zeros may be stored. + template + auto make_tensor_csc(ValTensor &val, + PosTensor &colp, + CrdTensor &row, const index_t (&shape)[2]); + +Matx Implementation of the UST Type +----------------------------------- + +The MatX implementation of the UST type can be found in the `sparse_tensor.h`_ +file. Similar to a dense tensor ``tensor_t``, the ``sparse_tensor_t`` is a +memory-backed, reference-counted operator that contains metadata about the +size, rank, and other properties, such as the storage format. Unlike dense +tensors, that consist of primary storage for the elements only, a sparse tensor +format consists of **primary storage** for the nonzero values (named ``values`` +when printed) and **secondary storage** (named ``pos[]`` and ``crd[]``, +respectively, for each level, when printed) to indicate the position of each +nonzero value. Note that this latter storage is not called metadata on purpose, +to not confuse it with the other metadata properties mentioned above. + +The type of primary and secondary storage can be anything that is accessible +to where the tensor is being used, including device memory, managed memory, +and host memory. MatX sparse tensors are very similar to e.g. SciPy's or +cuPy sparse arrays. + +Matx Implementation of the Tensor Format DSL +-------------------------------------------- + +The MatX implementation of the tensor format DSL can be found in the +`sparse_tensor_format.h`_ file. Most users do not have to concern +themselves with the details of this DSL, but can directly use predefined +type definitions for common tensor formats, like COO and CSR. + +In the tensor format DSL, the term **dimension** is used to refer to the axes of +the semantic tensor (as seen by the user), and the term **level** to refer to +the axes of the actual storage format (how it eventually resides in memory). + +The tensor format contains a map that provides the following: + +(1) An ordered sequence of dimension specifications, each of which includes: + + * a **dimension-expression**, which provides a reference to each dimension + +(2) An ordered sequence of level specifications, each of which includes: + + * a **level expression**, which defines what is stored in each level + * a required **level type**, which defines how the level is stored, including: + + * a required **level format** + * a collection of **level properties** + +Currently, the following level formats are supported: + +(1) **dense**: level is dense, entries along the level are stored and linearized +(2) **compressed**: level is sparse, only nonzeros along the level are stored + with positions and coordinates +(3) **singleton**: a variant of the compressed format, for when coordinates have + no siblings + +All level formats have the following level properties: + +(1) **non/unique** (are duplicates allowed at that level), +(2) **un/ordered** (are coordinates sorted at that level). + +Some 2-dim matrix examples are shown below (note that +block format has 2 dimensions and 4 levels):: + + COO: map = (i, j) -> ( i : compressed(non-unique), j : singleton ) + + CSR: map = (i, j) -> ( i : dense, j : compressed ) + + CSC: map = (i, j) -> ( j : dense, i : compressed ) # j and i swapped! + + DCSR: map = (i, j) -> ( i : compressed, j : compressed ) + + DCSC: map = (i, j) -> ( j : compressed, i : compressed ) + + BSR with 2x3 blocks: map = ( i, j ) -> ( i floordiv 2 : dense, + j floordiv 3 : compressed, + i mod 2 : dense, + j mod 3 : dense ) + +Two 3-dim tensor examples are shown below:: + + COO3: map = (i, j, k) -> ( i : compressed(non-unique), + j : singleton, + k : singleton ) + CSF3: map = (i, j, k) -> ( i : compressed, + j : compressed, + k : compressed ) + +Lastly, a 4-dim tensor examples is given here:: + + COO4: map = (i, j, k, l) -> ( i : compressed(non-unique), + j : singleton, + k : singleton, + l : singleton ) + +The C++ representation of the latter is given below:: + + using COO4 = SparseTensorFormat<4, + LvlSpec, + LvlSpec, + LvlSpec, + LvlSpec>; + +More examples can be found in the code. + +Historical Background of the UST Type +------------------------------------- + +The concept of the UST type has its roots in sparse compilers, first pioneered +for sparse linear algebra in [`B&W95`_, `Bik96`_, `Bik98`_] and formalized to +sparse tensor algebra in [`Kjolstad20`_, `Chou22`_, `Yadav22`_]. The tensor +format DSL for the UST type, including the generalization to higher-dimensional +levels, was introduced in [`MLIR22`_, `MLIR`_]. Please refer to this literature +for a more extensive presentation of all topics only briefly discussed in this +online documentation. + +.. _B&W95: https://dl.acm.org/doi/10.1145/169627.169765 +.. _Bik96: https://theses.liacs.nl/1315 +.. _Bik98: https://dl.acm.org/doi/10.1145/290200.287636 +.. _Chou22: http://tensor-compiler.org/files/chou-phd-thesis-taco-formats.pdf +.. _Kjolstad20: http://tensor-compiler.org/files/kjolstad-phd-thesis-taco-compiler.pdf +.. _MLIR22: https://dl.acm.org/doi/10.1145/3544559 +.. _MLIR: https://developers.google.com/mlir-sparsifier +.. _Yadav22: http://tensor-compiler.org/files/yadav-pldi22-distal.pdf +.. _make_sparse_tensor.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/make_sparse_tensor.h +.. _sparse_tensor.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/sparse_tensor.h +.. _sparse_tensor_format.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/sparse_tensor_format.h diff --git a/include/matx/core/sparse_tensor_format.h b/include/matx/core/sparse_tensor_format.h index fcde280f..d991eda5 100644 --- a/include/matx/core/sparse_tensor_format.h +++ b/include/matx/core/sparse_tensor_format.h @@ -37,92 +37,6 @@ namespace matx { namespace experimental { -// -// MatX implements a universal sparse tensor type that uses a tensor format -// DSL (Domain Specific Language) to describe a vast space of storage formats. -// Although the tensor format can easily define many common storage formats -// (such as Dense, CSR, CSC, BSR), it can also define many less common storage -// formats. In addition, the tensor format DSL can be extended to include even -// more storage formats in the future. -// -// In the tensor format, the term **dimension** is used to refer to the axes of -// the semantic tensor (as seen by the user), and the term **level** to refer to -// the axes of the actual storage format (how it eventually resides in memory). -// -// The tensor format contains a map that provides the following: -// -// (1) An ordered sequence of dimension specifications, each of which includes: -// -// (*) a dimension-expression, which provides a reference to each dimension -// -// (2) An ordered sequence of level specifications, each of which includes: -// -// (*) a level expression, which defines what is stored in each level -// (*) a required level type, which defines how the level is stored, -// including: -// (+) a required level format -// (+) a collection of level properties -// -// Currently, the following level formats are supported: -// -// (1) dense: level is dense, entries along the level are stored and linearized -// (2) compressed: level is sparse, only nonzeros along the level are stored -// with the compact positions and coordinates encoding -// (3) singleton: a variant of the compressed format, for when coordinates have -// no siblings -// -// All level formats have the following level properties: -// -// (1) non/unique (are duplicates allowed at that level), -// (2) un/ordered (are coordinates sorted at that level). -// -// Matrix Examples (dimension == 2, level >= dimension) -// -// COO: -// map = (i, j) -> ( i : compressed(non-unique), j : singleton ) -// -// CSR: -// map = (i, j) -> ( i : dense, j : compressed ) -// -// DCSR: -// map = (i, j) -> ( i : compressed, j : compressed ) -// -// CSC: -// map = (i, j) -> ( j : dense, i : compressed ) -// -// BSR with 2x3 blocks: -// map = ( i, j ) -> ( i floordiv 2 : dense, -// j floordiv 3 : compressed, -// i mod 2 : dense, -// j mod 3 : dense ) -// -// Tensor Examples (dimension > 2, level >= dimension) -// -// COO3: -// map = (i, j, k) -> ( i : compressed(non-unique), -// j : singleton, -// k : singleton ) -// -// CSF3: -// map = (i, j, k) -> ( i : compressed, -// j : compressed, -// k : compressed ) -// -// The idea of a universal sparse tensor type has its roots in -// sparse compilers, first pioneered for sparse linear algebra in [Bik96] -// and formalized to sparse tensor algebra in [Kjolstad20]. The generalization -// to higher-dimensional levels was introduced in [MLIR22]. -// -// [Bik96] Aart J.C. Bik. Compiler Support for Sparse Matrix Computations. -// PhD thesis, Leiden University, May 1996. -// [Kjolstad20] Fredrik Berg Kjolstad. Sparse Tensor Algebra Compilation. -// PhD thesis, MIT, February, 2020. -// [MLIR22] Aart J.C. Bik, Penporn Koanantakool, Tatiana Shpeisman, -// Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. -// Compiler Support for Sparse Tensor Computations in MLIR. -// ACM Transactions on Architecture and Code Optimization, June, 2022. -// - // // A level type consists of a level format together with a set of // level properties (ordered and unique by default).