Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start sparse tensor documentation #898

Merged
merged 7 commits into from
Mar 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
257 changes: 257 additions & 0 deletions docs_input/basics/sparse_tensor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,257 @@
Sparse Tensor Type
##################

MatX is in the process of adding experimental support for sparse tensors.
The implementation is based on the **Universal Sparse Tensor (UST)** type
that uses a tensor format DSL (Domain Specific Language) to describe a vast
space of storage formats. Although the UST type can easily define many common
storage formats (such as dense vectors and matrices, sparse vectors, sparse
matrices in COO, CSR, CSC, DCSR, DCSC, or BSR format, with generalizations
for sparse tensors), it can also define many less common storage formats.
From MatX's perspective, the advantage of using the UST type (rather than
various specific sparse storage formats) is that the framework code only has
to deal with a single new sparse type (and only dispatch to specific formats
when required by a high performance library implementation). Also, the tensor
format DSL can be easily extended to include even more sparse storage formats
in the future. From the user's perspective, the UST type provides more
flexibility in changing storage formats by merely changing annotations in the
type definitions, which allows for rapid experimentation with different ways
of storing sparse tensors in a MatX computation.

Quick Start
-----------

Despite the forward looking design of using the UST type, the current
experimental support provides a few factory methods with the common
formats COO, CSR, and CSC. The factory methods look similar to e.g.
sparse construction methods found in SciPy sparse or torch sparse.

For example, to create a COO representation of the following
4x8 matrix with 5 nonzero elements::

| 1, 2, 0, 0, 0, 0, 0, 0 |
A = | 0, 0, 0, 0, 0, 0, 0, 0 |
| 0, 0, 0, 0, 0, 0, 0, 0 |
| 0, 0, 3, 4, 0, 5, 0, 0 |

First, using a uniform memory space, set up the constituent 1-dim buffers
that contain, respectively, the value, i-index, and j-index of each nonzero
element, ordered lexicographically by row-then-column index, as follows::

auto vals = make_tensor<float>({5});
auto idxi = make_tensor<int>({5});
auto idxj = make_tensor<int>({5});
vals.SetVals({1, 2, 3, 4, 5});
idxi.SetVals({0, 0, 3, 3, 3});
idxj.SetVals({0, 1, 2, 3, 5});

Then, the COO representation of ``A``, residing in the same memory space as
its constituent buffers is constructed as follows::

auto Acoo = experimental::make_tensor_coo(vals, idxi, idxj, {4, 8});

print(Acoo);

The result of the print statement is shown below::

tensor_impl_2_f32: SparseTensor{float} Rank: 2, Sizes:[4, 8], Levels:[4, 8]
nse = 5
format = ( d0, d1 ) -> ( d0 : compressed(non-unique), d1 : singleton )
pos[0] = ( 0 5 )
crd[0] = ( 0 0 3 3 3 )
crd[1] = ( 0 1 2 3 5 )
values = ( 1.0000e+00 2.0000e+00 3.0000e+00 4.0000e+00 5.0000e+00 )
space = CUDA managed memory

Note that, like dense tensors, sparse tensors provide ()-operations
for indexing. However, users should **never** use the ()-operator
in performance critical code, since sparse storage formats do not
provide O(1) random access to their elements (compressed levels will
use some form of search to determine if an element is present)::

// Naive way to convert the sparse matrix back to a dense matrix.
auto A = make_tensor<float>({4, 8});
for (index_t i = 0; i < 4; i++) {
for (index_t j = 0; j < 8; j++) {
A(i, j) = Acoo(i, j);
}
}

Instead, conversions (and other operations) should use sparse operations
that are specifically optimized for the sparse storage format. The
correct way of performing the conversion above is as follows::

auto A = make_tensor<float>({4, 8});
(A = sparse2dense(Acoo)).run(exec);

The current experimental sparse support in MatX provides efficient
operations for sparse-to-dense, dense-to-sparse, matmul, and solve::

(A = sparse2dense(Acoo)).run(exec);
(Acoo = dense2sparse(D)).run(exec);
(C = matmul(Acoo, B)).run(exec);
(X = solve(Acsr, Y)).run(exec); // CSR only

We expect the assortment of supported sparse operations and storage
formats to grow if the experimental implementation is well-received.

Matx Sparse Tensor Factory Methods
----------------------------------

The MatX implementation of the factory methods for common cases of
the UST type can be found in the `make_sparse_tensor.h`_ file.
All methods build a sparse tensor storage format from constituent
1-dim buffers similar to methods found in SciPy or torch sparse.
A sample usage was already shown above. Currently only methods
to construct COO, CSR, and CSC are provided::

// Constructs a sparse matrix in COO format directly from the values and
// the two coordinates vectors. The entries should be sorted by row, then
// column. Duplicate entries should not occur. Explicit zeros may be stored.
template <typename ValTensor, typename CrdTensor>
auto make_tensor_coo(ValTensor &val,
CrdTensor &row,
CrdTensor &col, const index_t (&shape)[2]);

// Constructs a sparse matrix in CSR format directly from the values, the
// row positions, and column coordinates vectors. The entries should be
// sorted by row, then column. Explicit zeros may be stored. Duplicate
// entries should not occur. Explicit zeros may be stored.
template <typename ValTensor, typename PosTensor, typename CrdTensor>
auto make_tensor_csr(ValTensor &val,
PosTensor &rowp,
CrdTensor &col, const index_t (&shape)[2]);

// Constructs a sparse matrix in CSC format directly from the values, the
// column positions, and row coordinates vectors. The entries should be
// sorted by columns, then row. Explicit zeros may be stored. Duplicate
// entries should not occur. Explicit zeros may be stored.
template <typename ValTensor, typename PosTensor, typename CrdTensor>
auto make_tensor_csc(ValTensor &val,
PosTensor &colp,
CrdTensor &row, const index_t (&shape)[2]);

Matx Implementation of the UST Type
-----------------------------------

The MatX implementation of the UST type can be found in the `sparse_tensor.h`_
file. Similar to a dense tensor ``tensor_t``, the ``sparse_tensor_t`` is a
memory-backed, reference-counted operator that contains metadata about the
size, rank, and other properties, such as the storage format. Unlike dense
tensors, that consist of primary storage for the elements only, a sparse tensor
format consists of **primary storage** for the nonzero values (named ``values``
when printed) and **secondary storage** (named ``pos[]`` and ``crd[]``,
respectively, for each level, when printed) to indicate the position of each
nonzero value. Note that this latter storage is not called metadata on purpose,
to not confuse it with the other metadata properties mentioned above.

The type of primary and secondary storage can be anything that is accessible
to where the tensor is being used, including device memory, managed memory,
and host memory. MatX sparse tensors are very similar to e.g. SciPy's or
cuPy sparse arrays.

Matx Implementation of the Tensor Format DSL
--------------------------------------------

The MatX implementation of the tensor format DSL can be found in the
`sparse_tensor_format.h`_ file. Most users do not have to concern
themselves with the details of this DSL, but can directly use predefined
type definitions for common tensor formats, like COO and CSR.

In the tensor format DSL, the term **dimension** is used to refer to the axes of
the semantic tensor (as seen by the user), and the term **level** to refer to
the axes of the actual storage format (how it eventually resides in memory).

The tensor format contains a map that provides the following:

(1) An ordered sequence of dimension specifications, each of which includes:

* a **dimension-expression**, which provides a reference to each dimension

(2) An ordered sequence of level specifications, each of which includes:

* a **level expression**, which defines what is stored in each level
* a required **level type**, which defines how the level is stored, including:

* a required **level format**
* a collection of **level properties**

Currently, the following level formats are supported:

(1) **dense**: level is dense, entries along the level are stored and linearized
(2) **compressed**: level is sparse, only nonzeros along the level are stored
with positions and coordinates
(3) **singleton**: a variant of the compressed format, for when coordinates have
no siblings

All level formats have the following level properties:

(1) **non/unique** (are duplicates allowed at that level),
(2) **un/ordered** (are coordinates sorted at that level).

Some 2-dim matrix examples are shown below (note that
block format has 2 dimensions and 4 levels)::

COO: map = (i, j) -> ( i : compressed(non-unique), j : singleton )

CSR: map = (i, j) -> ( i : dense, j : compressed )

CSC: map = (i, j) -> ( j : dense, i : compressed ) # j and i swapped!

DCSR: map = (i, j) -> ( i : compressed, j : compressed )

DCSC: map = (i, j) -> ( j : compressed, i : compressed )

BSR with 2x3 blocks: map = ( i, j ) -> ( i floordiv 2 : dense,
j floordiv 3 : compressed,
i mod 2 : dense,
j mod 3 : dense )

Two 3-dim tensor examples are shown below::

COO3: map = (i, j, k) -> ( i : compressed(non-unique),
j : singleton,
k : singleton )
CSF3: map = (i, j, k) -> ( i : compressed,
j : compressed,
k : compressed )

Lastly, a 4-dim tensor examples is given here::

COO4: map = (i, j, k, l) -> ( i : compressed(non-unique),
j : singleton,
k : singleton,
l : singleton )

The C++ representation of the latter is given below::

using COO4 = SparseTensorFormat<4,
LvlSpec<D0, LvlType::CompressedNonUnique>,
LvlSpec<D1, LvlType::Singleton>,
LvlSpec<D2, LvlType::Singleton>,
LvlSpec<D3, LvlType::Singleton>>;

More examples can be found in the code.

Historical Background of the UST Type
-------------------------------------

The concept of the UST type has its roots in sparse compilers, first pioneered
for sparse linear algebra in [`B&W95`_, `Bik96`_, `Bik98`_] and formalized to
sparse tensor algebra in [`Kjolstad20`_, `Chou22`_, `Yadav22`_]. The tensor
format DSL for the UST type, including the generalization to higher-dimensional
levels, was introduced in [`MLIR22`_, `MLIR`_]. Please refer to this literature
for a more extensive presentation of all topics only briefly discussed in this
online documentation.

.. _B&W95: https://dl.acm.org/doi/10.1145/169627.169765
.. _Bik96: https://theses.liacs.nl/1315
.. _Bik98: https://dl.acm.org/doi/10.1145/290200.287636
.. _Chou22: http://tensor-compiler.org/files/chou-phd-thesis-taco-formats.pdf
.. _Kjolstad20: http://tensor-compiler.org/files/kjolstad-phd-thesis-taco-compiler.pdf
.. _MLIR22: https://dl.acm.org/doi/10.1145/3544559
.. _MLIR: https://developers.google.com/mlir-sparsifier
.. _Yadav22: http://tensor-compiler.org/files/yadav-pldi22-distal.pdf
.. _make_sparse_tensor.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/make_sparse_tensor.h
.. _sparse_tensor.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/sparse_tensor.h
.. _sparse_tensor_format.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/sparse_tensor_format.h
86 changes: 0 additions & 86 deletions include/matx/core/sparse_tensor_format.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,92 +37,6 @@
namespace matx {
namespace experimental {

//
// MatX implements a universal sparse tensor type that uses a tensor format
// DSL (Domain Specific Language) to describe a vast space of storage formats.
// Although the tensor format can easily define many common storage formats
// (such as Dense, CSR, CSC, BSR), it can also define many less common storage
// formats. In addition, the tensor format DSL can be extended to include even
// more storage formats in the future.
//
// In the tensor format, the term **dimension** is used to refer to the axes of
// the semantic tensor (as seen by the user), and the term **level** to refer to
// the axes of the actual storage format (how it eventually resides in memory).
//
// The tensor format contains a map that provides the following:
//
// (1) An ordered sequence of dimension specifications, each of which includes:
//
// (*) a dimension-expression, which provides a reference to each dimension
//
// (2) An ordered sequence of level specifications, each of which includes:
//
// (*) a level expression, which defines what is stored in each level
// (*) a required level type, which defines how the level is stored,
// including:
// (+) a required level format
// (+) a collection of level properties
//
// Currently, the following level formats are supported:
//
// (1) dense: level is dense, entries along the level are stored and linearized
// (2) compressed: level is sparse, only nonzeros along the level are stored
// with the compact positions and coordinates encoding
// (3) singleton: a variant of the compressed format, for when coordinates have
// no siblings
//
// All level formats have the following level properties:
//
// (1) non/unique (are duplicates allowed at that level),
// (2) un/ordered (are coordinates sorted at that level).
//
// Matrix Examples (dimension == 2, level >= dimension)
//
// COO:
// map = (i, j) -> ( i : compressed(non-unique), j : singleton )
//
// CSR:
// map = (i, j) -> ( i : dense, j : compressed )
//
// DCSR:
// map = (i, j) -> ( i : compressed, j : compressed )
//
// CSC:
// map = (i, j) -> ( j : dense, i : compressed )
//
// BSR with 2x3 blocks:
// map = ( i, j ) -> ( i floordiv 2 : dense,
// j floordiv 3 : compressed,
// i mod 2 : dense,
// j mod 3 : dense )
//
// Tensor Examples (dimension > 2, level >= dimension)
//
// COO3:
// map = (i, j, k) -> ( i : compressed(non-unique),
// j : singleton,
// k : singleton )
//
// CSF3:
// map = (i, j, k) -> ( i : compressed,
// j : compressed,
// k : compressed )
//
// The idea of a universal sparse tensor type has its roots in
// sparse compilers, first pioneered for sparse linear algebra in [Bik96]
// and formalized to sparse tensor algebra in [Kjolstad20]. The generalization
// to higher-dimensional levels was introduced in [MLIR22].
//
// [Bik96] Aart J.C. Bik. Compiler Support for Sparse Matrix Computations.
// PhD thesis, Leiden University, May 1996.
// [Kjolstad20] Fredrik Berg Kjolstad. Sparse Tensor Algebra Compilation.
// PhD thesis, MIT, February, 2020.
// [MLIR22] Aart J.C. Bik, Penporn Koanantakool, Tatiana Shpeisman,
// Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad.
// Compiler Support for Sparse Tensor Computations in MLIR.
// ACM Transactions on Architecture and Code Optimization, June, 2022.
//

//
// A level type consists of a level format together with a set of
// level properties (ordered and unique by default).
Expand Down