NVIDIA · aartbik · Mar 5, 2025 · Mar 5, 2025 · Mar 5, 2025 · Mar 5, 2025
diff --git a/docs_input/basics/sparse_tensor.rst b/docs_input/basics/sparse_tensor.rst
@@ -0,0 +1,257 @@
+Sparse Tensor Type
+##################
+
+MatX is in the process of adding experimental support for sparse tensors.
+The implementation is based on the **Universal Sparse Tensor (UST)** type
+that uses a tensor format DSL (Domain Specific Language) to describe a vast
+space of storage formats. Although the UST type can easily define many common
+storage formats (such as dense vectors and matrices, sparse vectors, sparse
+matrices in COO, CSR, CSC, DCSR, DCSC, or BSR format, with generalizations
+for sparse tensors), it can also define many less common storage formats.
+From MatX's perspective, the advantage of using the UST type (rather than
+various specific sparse storage formats) is that the framework code only has
+to deal with a single new sparse type (and only dispatch to specific formats
+when required by a high performance library implementation). Also, the tensor
+format DSL can be easily extended to include even more sparse storage formats
+in the future. From the user's perspective, the UST type provides more
+flexibility in changing storage formats by merely changing annotations in the
+type definitions, which allows for rapid experimentation with different ways
+of storing sparse tensors in a MatX computation.
+
+Quick Start
+-----------
+
+Despite the forward looking design of using the UST type, the current
+experimental support provides a few factory methods with the common
+formats COO, CSR, and CSC. The factory methods look similar to e.g.
+sparse construction methods found in SciPy sparse or torch sparse.
+
+For example, to create a COO representation of the following
+4x8 matrix with 5 nonzero elements::
+
+       | 1, 2, 0, 0, 0, 0, 0, 0 |
+   A = | 0, 0, 0, 0, 0, 0, 0, 0 |
+       | 0, 0, 0, 0, 0, 0, 0, 0 |
+       | 0, 0, 3, 4, 0, 5, 0, 0 |
+
+First, using a uniform memory space, set up the constituent 1-dim buffers
+that contain, respectively, the value, i-index, and j-index of each nonzero
+element, ordered lexicographically by row-then-column index, as follows::
+
+  auto vals = make_tensor<float>({5});
+  auto idxi = make_tensor<int>({5});
+  auto idxj = make_tensor<int>({5});
+  vals.SetVals({1, 2, 3, 4, 5});
+  idxi.SetVals({0, 0, 3, 3, 3});
+  idxj.SetVals({0, 1, 2, 3, 5});
+
+Then, the COO representation of ``A``, residing in the same memory space as
+its constituent buffers is constructed as follows::
+
+  auto Acoo = experimental::make_tensor_coo(vals, idxi, idxj, {4, 8});
+
+  print(Acoo);
+
+The result of the print statement is shown below::
+
+  tensor_impl_2_f32: SparseTensor{float} Rank: 2, Sizes:[4, 8], Levels:[4, 8]
+  nse    = 5
+  format = ( d0, d1 ) -> ( d0 : compressed(non-unique), d1 : singleton )
+  pos[0] = ( 0  5 )
+  crd[0] = ( 0  0  3  3  3 )
+  crd[1] = ( 0  1  2  3  5 )
+  values = ( 1.0000e+00  2.0000e+00  3.0000e+00  4.0000e+00  5.0000e+00 )
+  space  = CUDA managed memory
+
+Note that, like dense tensors, sparse tensors provide ()-operations
+for indexing.  However, users should **never** use the ()-operator
+in performance critical code, since sparse storage formats do not
+provide O(1) random access to their elements (compressed levels will
+use some form of search to determine if an element is present)::
+
+  // Naive way to convert the sparse matrix back to a dense matrix.
+  auto A = make_tensor<float>({4, 8});
+  for (index_t i = 0; i < 4; i++) {
+    for (index_t j = 0; j < 8; j++) {
+      A(i, j) = Acoo(i, j);
+    }
+  }
+
+Instead, conversions (and other operations) should use sparse operations
+that are specifically optimized for the sparse storage format. The
+correct way of performing the conversion above is as follows::
+
+  auto A = make_tensor<float>({4, 8});
+  (A = sparse2dense(Acoo)).run(exec);
+
+The current experimental sparse support in MatX provides efficient
+operations for sparse-to-dense, dense-to-sparse, matmul, and solve::
+
+   (A = sparse2dense(Acoo)).run(exec);
+   (Acoo = dense2sparse(D)).run(exec);
+   (C = matmul(Acoo, B)).run(exec);
+   (X = solve(Acsr, Y)).run(exec);     // CSR only
+
+We expect the assortment of supported sparse operations and storage
+formats to grow if the experimental implementation is well-received.
+
+Matx Sparse Tensor Factory Methods
+----------------------------------
+
+The MatX implementation of the factory methods for common cases of
+the UST type can be found in the `make_sparse_tensor.h`_ file.
+All methods build a sparse tensor storage format from constituent
+1-dim buffers similar to methods found in SciPy or torch sparse.
+A sample usage was already shown above. Currently only methods
+to construct COO, CSR, and CSC are provided::
+
+  // Constructs a sparse matrix in COO format directly from the values and
+  // the two coordinates vectors. The entries should be sorted by row, then
+  // column. Duplicate entries should not occur. Explicit zeros may be stored.
+  template <typename ValTensor, typename CrdTensor>
+  auto make_tensor_coo(ValTensor &val,
+                       CrdTensor &row,
+                       CrdTensor &col, const index_t (&shape)[2]);
+
+  // Constructs a sparse matrix in CSR format directly from the values, the
+  // row positions, and column coordinates vectors. The entries should be
+  // sorted by row, then column. Explicit zeros may be stored. Duplicate
+  // entries should not occur. Explicit zeros may be stored.
+  template <typename ValTensor, typename PosTensor, typename CrdTensor>
+  auto make_tensor_csr(ValTensor &val,
+                       PosTensor &rowp,
+                       CrdTensor &col, const index_t (&shape)[2]);
+
+  // Constructs a sparse matrix in CSC format directly from the values, the
+  // column positions, and row coordinates vectors. The entries should be
+  // sorted by columns, then row. Explicit zeros may be stored. Duplicate
+  // entries should not occur. Explicit zeros may be stored.
+  template <typename ValTensor, typename PosTensor, typename CrdTensor>
+  auto make_tensor_csc(ValTensor &val,
+                       PosTensor &colp,
+                       CrdTensor &row, const index_t (&shape)[2]);
+
+Matx Implementation of the UST Type
+-----------------------------------
+
+The MatX implementation of the UST type can be found in the `sparse_tensor.h`_
+file. Similar to a dense tensor ``tensor_t``, the ``sparse_tensor_t`` is a
+memory-backed, reference-counted operator that contains metadata about the
+size, rank, and other properties, such as the storage format. Unlike dense
+tensors, that consist of primary storage for the elements only, a sparse tensor
+format consists of **primary storage** for the nonzero values (named ``values``
+when printed) and **secondary storage** (named ``pos[]`` and ``crd[]``,
+respectively, for each level, when printed) to indicate the position of each
+nonzero value. Note that this latter storage is not called metadata on purpose,
+to not confuse it with the other metadata properties mentioned above.
+
+The type of primary and secondary storage can be anything that is accessible
+to where the tensor is being used, including device memory, managed memory,
+and host memory. MatX sparse tensors are very similar to e.g. SciPy's or
+cuPy sparse arrays.
+
+Matx Implementation of the Tensor Format DSL
+--------------------------------------------
+
+The MatX implementation of the tensor format DSL can be found in the
+`sparse_tensor_format.h`_ file. Most users do not have to concern
+themselves with the details of this DSL, but can directly use predefined
+type definitions for common tensor formats, like COO and CSR.
+
+In the tensor format DSL, the term **dimension** is used to refer to the axes of
+the semantic tensor (as seen by the user), and the term **level** to refer to
+the axes of the actual storage format (how it eventually resides in memory).
+
+The tensor format contains a map that provides the following:
+
+(1) An ordered sequence of dimension specifications, each of which includes:
+
+    * a **dimension-expression**, which provides a reference to each dimension
+
+(2) An ordered sequence of level specifications, each of which includes:
+
+    * a **level expression**, which defines what is stored in each level
+    * a required **level type**, which defines how the level is stored, including:
+
+      * a required **level format**
+      * a collection of **level properties**
+
+Currently, the following level formats are supported:
+
+(1) **dense**: level is dense, entries along the level are stored and linearized
+(2) **compressed**: level is sparse, only nonzeros along the level are stored
+    with positions and coordinates
+(3) **singleton**: a variant of the compressed format, for when coordinates have
+    no siblings
+
+All level formats have the following level properties:
+
+(1) **non/unique** (are duplicates allowed at that level),
+(2) **un/ordered** (are coordinates sorted at that level).
+
+Some 2-dim matrix examples are shown below (note that 
+block format has 2 dimensions and 4 levels)::
+
+  COO: map = (i, j) -> ( i : compressed(non-unique), j : singleton )
+
+  CSR: map = (i, j) -> ( i : dense, j : compressed )
+
+  CSC: map = (i, j) -> ( j : dense, i : compressed )  # j and i swapped!
+
+  DCSR: map = (i, j) -> ( i : compressed, j : compressed )
+
+  DCSC: map = (i, j) -> ( j : compressed, i : compressed )
+
+  BSR with 2x3 blocks: map = ( i, j ) -> ( i floordiv 2 : dense,
+                                           j floordiv 3 : compressed,
+                                           i mod 2      : dense,
+                                           j mod 3      : dense )
+
+Two 3-dim tensor examples are shown below::
+
+  COO3: map = (i, j, k) -> ( i : compressed(non-unique),
+                             j : singleton,
+                             k : singleton )
+  CSF3: map = (i, j, k) -> ( i : compressed,
+                             j : compressed,
+                             k : compressed )
+
+Lastly, a 4-dim tensor examples is given here::
+
+  COO4: map = (i, j, k, l) -> ( i : compressed(non-unique),
+                                j : singleton,
+                                k : singleton,
+                                l : singleton )
+
+The C++ representation of the latter is given below::
+
+  using COO4 = SparseTensorFormat<4,
+                 LvlSpec<D0, LvlType::CompressedNonUnique>,
+                 LvlSpec<D1, LvlType::Singleton>,
+                 LvlSpec<D2, LvlType::Singleton>,
+                 LvlSpec<D3, LvlType::Singleton>>;
+
+More examples can be found in the code.
+
+Historical Background of the UST Type
+-------------------------------------
+
+The concept of the UST type has its roots in sparse compilers, first pioneered
+for sparse linear algebra in [`B&W95`_, `Bik96`_, `Bik98`_] and formalized to
+sparse tensor algebra in [`Kjolstad20`_, `Chou22`_, `Yadav22`_]. The tensor
+format DSL for the UST type, including the generalization to higher-dimensional
+levels, was introduced in [`MLIR22`_, `MLIR`_]. Please refer to this literature
+for a more extensive presentation of all topics only briefly discussed in this
+online documentation.
+
+.. _B&W95: https://dl.acm.org/doi/10.1145/169627.169765
+.. _Bik96: https://theses.liacs.nl/1315
+.. _Bik98: https://dl.acm.org/doi/10.1145/290200.287636
+.. _Chou22: http://tensor-compiler.org/files/chou-phd-thesis-taco-formats.pdf
+.. _Kjolstad20: http://tensor-compiler.org/files/kjolstad-phd-thesis-taco-compiler.pdf
+.. _MLIR22: https://dl.acm.org/doi/10.1145/3544559
+.. _MLIR: https://developers.google.com/mlir-sparsifier
+.. _Yadav22: http://tensor-compiler.org/files/yadav-pldi22-distal.pdf
+.. _make_sparse_tensor.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/make_sparse_tensor.h
+.. _sparse_tensor.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/sparse_tensor.h
+.. _sparse_tensor_format.h: https://github.com/NVIDIA/MatX/blob/main/include/matx/core/sparse_tensor_format.h
diff --git a/include/matx/core/sparse_tensor_format.h b/include/matx/core/sparse_tensor_format.h
@@ -37,92 +37,6 @@
 namespace matx {
 namespace experimental {
 
-//
-// MatX implements a universal sparse tensor type that uses a tensor format
-// DSL (Domain Specific Language) to describe a vast space of storage formats.
-// Although the tensor format can easily define many common storage formats
-// (such as Dense, CSR, CSC, BSR), it can also define many less common storage
-// formats. In addition, the tensor format DSL can be extended to include even
-// more storage formats in the future.
-//
-// In the tensor format, the term **dimension** is used to refer to the axes of
-// the semantic tensor (as seen by the user), and the term **level** to refer to
-// the axes of the actual storage format (how it eventually resides in memory).
-//
-// The tensor format contains a map that provides the following:
-//
-// (1) An ordered sequence of dimension specifications, each of which includes:
-//
-//     (*) a dimension-expression, which provides a reference to each dimension
-//
-// (2) An ordered sequence of level specifications, each of which includes:
-//
-//     (*) a level expression, which defines what is stored in each level
-//     (*) a required level type, which defines how the level is stored,
-//     including:
-//         (+) a required level format
-//         (+) a collection of level properties
-//
-// Currently, the following level formats are supported:
-//
-// (1) dense: level is dense, entries along the level are stored and linearized
-// (2) compressed: level is sparse, only nonzeros along the level are stored
-//     with the compact positions and coordinates encoding
-// (3) singleton: a variant of the compressed format, for when coordinates have
-//     no siblings
-//
-// All level formats have the following level properties:
-//
-// (1) non/unique (are duplicates allowed at that level),
-// (2) un/ordered (are coordinates sorted at that level).
-//
-// Matrix Examples (dimension == 2, level >= dimension)
-//
-// COO:
-//     map = (i, j) -> ( i : compressed(non-unique), j : singleton )
-//
-// CSR:
-//     map = (i, j) -> ( i : dense, j : compressed )
-//
-// DCSR:
-//     map = (i, j) -> ( i : compressed, j : compressed )
-//
-// CSC:
-//     map = (i, j) -> ( j : dense, i : compressed )
-//
-// BSR with 2x3 blocks:
-//      map = ( i, j ) -> ( i floordiv 2 : dense,
-//                          j floordiv 3 : compressed,
-//                          i mod 2      : dense,
-//                          j mod 3      : dense )
-//
-// Tensor Examples (dimension > 2, level >= dimension)
-//
-// COO3:
-//     map = (i, j, k) -> ( i : compressed(non-unique),
-//                          j : singleton,
-//                          k : singleton )
-//
-// CSF3:
-//     map = (i, j, k) -> ( i : compressed,
-//                          j : compressed,
-//                          k : compressed )
-//
-// The idea of a universal sparse tensor type has its roots in
-// sparse compilers, first pioneered for sparse linear algebra in [Bik96]
-// and formalized to sparse tensor algebra in [Kjolstad20]. The generalization
-// to higher-dimensional levels was introduced in [MLIR22].
-//
-// [Bik96] Aart J.C. Bik. Compiler Support for Sparse Matrix Computations.
-//         PhD thesis, Leiden University, May 1996.
-// [Kjolstad20] Fredrik Berg Kjolstad. Sparse Tensor Algebra Compilation.
-//              PhD thesis, MIT, February, 2020.
-// [MLIR22] Aart J.C. Bik, Penporn Koanantakool, Tatiana Shpeisman,
-//        Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad.
-//        Compiler Support for Sparse Tensor Computations in MLIR.
-//        ACM Transactions on Architecture and Code Optimization, June, 2022.
-//
-
 //
 // A level type consists of a level format together with a set of
 // level properties (ordered and unique by default).