Skip to content

Commit dbd27ba

Browse files
authored
Added documentation on integrating with existing software (#852)
1 parent adb0ff3 commit dbd27ba

File tree

2 files changed

+147
-0
lines changed

2 files changed

+147
-0
lines changed

docs_input/external.rst

+146
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
.. _devexternal:
2+
3+
Interfacing With External Code and Libraries
4+
############################################
5+
6+
Existing host and CUDA code can interoperate seamlessly with MatX both by using MatX primitives in existing code,
7+
and transferring MatX data into other libraries. Integrating MatX into existing code is a common use case that
8+
allows developers to incrementally port code into MatX without having to rewrite everything at once.
9+
10+
This guide is not intended for developers who wish to extend MatX See :ref:`devguide` for the MatX developer guide.
11+
12+
13+
Passing Existing Pointers to MatX
14+
---------------------------------
15+
16+
To use MatX in existing code, the pointers (whether host or device) are passed into the `make_tensor` call as the
17+
first parameter:
18+
19+
.. code-block:: cpp
20+
21+
// Existing code
22+
float *my_data_ptr;
23+
cudaMalloc((void*)&my_data_ptr, 100 * sizeof(float)); // Treated as a 10x10 float matrix in the code
24+
foo(my_data_ptr); // Call existing function that uses my_data_ptr
25+
26+
// Work with my_data_ptr on the device
27+
28+
// End of existing code. Convert to MatX tensor
29+
auto matx_tensor = matx::make_tensor<float>(my_data_ptr, {10, 10});
30+
31+
// MatX functions
32+
33+
In the code above the developer has an existing device pointer that they used in their CUDA code. It's common in existing
34+
CUDA code to see linear allocations like the one above, but the developer treats it as a higher-dimension tensor in the code.
35+
For this example `my_data_ptr` was allocated with linear memory holding 100 floats, but the user treats it later as a 10x10 matrix.
36+
37+
Since MatX needs to know the shape of the tensor when it's created, we explictly pass the `{10, 10}` shape into the
38+
`make_tensor` call.
39+
40+
By default MatX will not take ownership of the pointer; the user is responsible for freeing the memory when they are done with it.
41+
This is true of all `make_tensor` calls that take an existing pointer as an argument since the user typically has their own
42+
memory management outside of MatX. The last parameter of each `make_tensor` call is a boolean named `owning` that tells MatX to
43+
take ownership, and defaults to *false*. By setting `owning` to *true*, MatX will free the memory when the tensor goes out of scope.
44+
By default it uses its own allocator, but users can pass in their own PMR-compatible allocator if they wish. For more information
45+
see :ref:`creating`.
46+
47+
Passing MatX Operators to External Code/Libraries
48+
-------------------------------------------------
49+
50+
MatX operators can be passed to external code or libraries in two ways: by object or by pointer. Passing MatX operators by object is
51+
the preferred way when possible. Doing so maintains all of the internal information and state that is contained in the operator and
52+
reduces the chances of errors.
53+
54+
Sometimes code cannot be modified to allow for passing by object. This is common when working with libraries that have API that
55+
cannot be changed easily, or if the overhead of passing by value is too large. MatX also allows developers to extract the pointer
56+
from a MatX operator and pass it to external code by using the `Data()` method of a tensor. Note that unlike the "pass-by-object" method,
57+
this method only works for tensors since general operators do not have a data pointer.
58+
59+
Care must be taken when passing either operators or pointers to existing code to avoid bugs:
60+
61+
* The data is only valid for the lifetime of the tensor. If the tensor goes out of scope, the data backing the tensor is invalid. For
62+
example, if a CUDA kernel is called asynchronously with a tensor as a parameter, then the tensor goes out of scope while the kernel
63+
runs, the results are undefined.
64+
* The *kind* of the pointer must be known to the external code. For example, if the tensor was created in device memory, the external
65+
code must access it only where device memory is accessible.
66+
67+
If the external code supports the *dlpack* standard, the tensor's `ToDLPack()` method can be used instead to get a `DLManagedTensor` object.
68+
This method is much safer since all shape and ownership can be transferred.
69+
70+
71+
Passing By Object
72+
=================
73+
74+
Passing by object makes all of the object's metadata available inside of an external function. Since operator types can be very complex, it's
75+
always recommended to pass the operator as a template parameter rather than specifying the type of the operator. Passing by value does *not*
76+
copy the data (if any) backing the operator; only the metadata (shape, strides, etc) is copied.
77+
78+
.. code-block:: cpp
79+
80+
template <typename Op>
81+
void foo(Op &op)
82+
{
83+
// Do something with the operator
84+
auto val = op(10, 1);
85+
}
86+
87+
template <typename Op>
88+
__global__ void foo_kernel(Op op)
89+
{
90+
// Do something with the operator
91+
auto val = op(10, 1);
92+
}
93+
94+
// Create a MatX operator
95+
auto t1 = matx::make_tensor<float>({10, 10});
96+
auto t2 = matx::make_tensor<float>({10, 10});
97+
auto o1 = (t1 + t2) * 2.0f;
98+
99+
foo(o1);
100+
101+
typename matx::detail::base_type_t<decltype(o1)> o1_base;
102+
foo_kernel<<<1,1>>>(o1_base);
103+
104+
The first function `foo` is a host function that takes a MatX operator as a template parameter by reference, while `foo_kernel` is
105+
a CUDA kernel that takes the operator by value. When passing an operator to a CUDA kernel it should always be passed by value
106+
unless the operator's memory is accessible on the device. The template parameter allows the user to pass any operator to the
107+
function that adheres to the operator interface. This is a powerful concept that reduces the need for code changes if the type
108+
of the operator changes. For example, changing the `o1` statment to `t1 - t2` would change the type of the operator, but using
109+
templates allows the same code to exist in `foo` without changing the type.
110+
111+
For more information about the *operator interface*. see :ref:`concepts`.
112+
113+
Inside of both `foo` and `foo_kernel` all functions in the *operator interface* are available. `op(10, 1)` will return the value
114+
at the 11th row and 2nd column of the operator (0-based). Using `operator()` inside of the operator will handle all the indexing
115+
logic to handle the shape and stride of the operator.
116+
117+
The last part to mention in the code is the declaration of `o1_base`. Some operator types in MatX, such as a `tensor_t`, cannot
118+
be passed directly to a CUDA kernel due to internal types that cannot be used on the device. The `base_type_t` type trait will
119+
convert the operator to a type that can be used on the device if needed, or it will return the same type if it's already usable
120+
on the device.
121+
122+
Passing By Pointer
123+
==================
124+
125+
In the code above `t1` and `t2` could have their pointers extracted, but `o1` could not. For that reason, passing raw pointers
126+
can only be used on tensors and not other operators.
127+
128+
.. code-block:: cpp
129+
130+
#include <matx.h>
131+
132+
// Existing function
133+
void foo(float *data);
134+
135+
// Create a MatX tensor in managed memory
136+
auto t1 = matx::make_tensor<float>({10, 10});
137+
138+
// MatX processing code
139+
140+
// Existing code
141+
foo(t1.Data());
142+
143+
The above example shows an existing function `foo` taking in a pointer from the MatX tensor `t1`. Since only a pointer is available, all
144+
metadata available in the operator (shape, strides, etc) is not available inside of the function, and the user must ensure the correctness
145+
of usage with the pointer.
146+

docs_input/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Table of Contents
4343

4444
quickstart
4545
build.rst
46+
external.rst
4647
basics/index.rst
4748
api/index.rst
4849
examples/index.rst

0 commit comments

Comments
 (0)