Skip to content

Commit b5da91c

Browse files
committed
Added documentation on integrating with existing software
1 parent 5010a55 commit b5da91c

File tree

1 file changed

+145
-0
lines changed

1 file changed

+145
-0
lines changed

docs_input/external.rst

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
.. _devexternal:
2+
3+
Interfacing With External Code and Libraries
4+
############################################
5+
6+
Existing host and CUDA code can interoperate seamlessly with MatX both by using MatX primitives in existing code,
7+
and transferring MatX data into other libraries. Integrating MatX into existing code is a common use case that
8+
allows developers to incrementally port code into MatX without having to rewrite everything at once.
9+
10+
This guide is not intended for developers who wish to extend MatX See :ref:`_devguide` for the MatX developer guide.
11+
12+
13+
Passing Existing Pointers to MatX
14+
---------------------------------
15+
16+
To use MatX in existing code, the pointers (whether host or device) are passed into the `make_tensor` call as the
17+
first parameter:
18+
19+
.. code-block:: cpp
20+
21+
// Existing code
22+
float *my_data_ptr;
23+
cudaMalloc((void*)&my_data_ptr, 100 * sizeof(float)); // Treated as a 10x10 float matrix in the code
24+
foo(my_data_ptr); // Call existing function that uses my_data_ptr
25+
26+
// Work with my_data_ptr on the device
27+
28+
// End of existing code. Convert to MatX tensor
29+
auto matx_tensor = matx::make_tensor<float>(my_data_ptr, {10, 10});
30+
31+
// MatX functions
32+
33+
In the code above the developer has an existing device pointer that they used in their CUDA code. It's common in existing
34+
CUDA code to see linear allocations like the one above, but the developer treats it as a higher-dimension tensor in the code.
35+
For this example `my_data_ptr` was allocated with linear memory holding 100 floats, but the user treats it later as a 10x10 matrix.
36+
37+
Since MatX needs to know the shape of the tensor when it's created, we explictly pass the `{10, 10}` shape into the
38+
`make_tensor` call.
39+
40+
By default MatX will not take ownership of the pointer; the user is responsible for freeing the memory when they are done with it.
41+
This is true of all `make_tensor` calls that take an existing pointer as an argument since the user typically has their own
42+
memory management outside of MatX. The last parameter of each `make_tensor` call is a boolean named `owning` that tells MatX to
43+
take ownership, and defaults to *false*. By setting `owning` to *true*, MatX will free the memory when the tensor goes out of scope.
44+
By default it uses its own allocator, but users can pass in their own PMR-compatible allocator if they wish. For more information
45+
see :ref:`_creating`.
46+
47+
Passing MatX Operators to External Code/Libraries
48+
-------------------------------------------------
49+
50+
MatX operators can be passed to external code or libraries in two ways: by object or by pointer. Passing MatX operators by object is
51+
the preferred way when possible. Doing so maintains all of the internal information and state that is contained in the operator and
52+
reduces the chances of errors.
53+
54+
Sometimes code cannot be modified to allow for passing by object. This is common when working with libraries that have API that
55+
cannot be changed easily, or if the overhead of passing by value is too large. MatX also allows developers to extract the pointer
56+
from a MatX operator and pass it to external code by using the `Data()` method of a tensor. Note that unlike the "pass-by-object" method,
57+
this method only works for tensors since general operators do not have a data pointer.
58+
59+
Care must be taken when passing either operators or pointers to existing code to avoid bugs:
60+
61+
* The data is only valid for the lifetime of the tensor. If the tensor goes out of scope, the data backing the tensor is invalid. For
62+
example, if a CUDA kernel is called asynchronously with a tensor as a parameter, then the tensor goes out of scope while the kernel
63+
runs, the results are undefined.
64+
* The *kind* of the pointer must be known to the external code. For example, if the tensor was created in device memory, the external
65+
code must access it only where device memory is accessible.
66+
67+
If the external code supports the *dlpack* standard, the tensor's `ToDLPack()` method can be used instead to get a `DLManagedTensor` object.
68+
This method is much safer since all shape and ownership can be transferred.
69+
70+
71+
Passing By Object
72+
=================
73+
74+
Passing by object makes all of the object's metadata available inside of an external function. Since operator types can be very complex, it's
75+
always recommended to pass the operator as a template parameter rather than specifying the type of the operator:
76+
77+
.. code-block:: cpp
78+
79+
template <typename Op>
80+
void foo(Op &op)
81+
{
82+
// Do something with the operator
83+
auto val = op(10, 1);
84+
}
85+
86+
template <typename Op>
87+
__global__ void foo_kernel(Op op)
88+
{
89+
// Do something with the operator
90+
auto val = op(10, 1);
91+
}
92+
93+
// Create a MatX operator
94+
auto t1 = matx::make_tensor<float>({10, 10});
95+
auto t2 = matx::make_tensor<float>({10, 10});
96+
auto o1 = (t1 + t2) * 2.0f;
97+
98+
foo(o1);
99+
100+
typename matx::detail::base_type_t<decltype(o1)> o1_base;
101+
foo_kernel<<<1,1>>>(o1_base);
102+
103+
The first function `foo` is a host function that takes a MatX operator as a template parameter by reference, while `foo_kernel` is
104+
a CUDA kernel that takes the operator by value. When passing an operator to a CUDA kernel it should always be passed by value
105+
unless the operator's memory is accessible on the device. The template parameter allows the user to pass any operator to the
106+
function that adheres to the operator interface. This is a powerful concept that reduces the need for code changes if the type
107+
of the operator changes. For example, changing the `o1` statment to `t1 - t2` would change the type of the operator, but using
108+
templates allows the same code to exist in `foo` without changing the type.
109+
110+
For more information about the *operator interface*. see :ref:`_concepts`.
111+
112+
Inside of both `foo` and `foo_kernel` all functions in the *operator interface* are available. `op(10, 1)` will return the value
113+
at the 11th row and 2nd column of the operator (0-based). Using `operator()` inside of the operator will handle all the indexing
114+
logic to handle the shape and stride of the operator.
115+
116+
The last part to mention in the code is the declaration of `o1_base`. Some operator types in MatX, such as a `tensor_t`, cannot
117+
be passed directly to a CUDA kernel due to internal types that cannot be used on the device. The `base_type_t` type trait will
118+
convert the operator to a type that can be used on the device if needed, or it will return the same type if it's already usable
119+
on the device.
120+
121+
Passing By Pointer
122+
==================
123+
124+
In the code above `t1` and `t2` could have their pointers extracted, but `o1` could not. For that reason, passing raw pointers
125+
can only be used on tensors and not other operators.
126+
127+
.. code-block:: cpp
128+
129+
#include <matx.h>
130+
131+
// Existing function
132+
void foo(float *data);
133+
134+
// Create a MatX tensor in managed memory
135+
auto t1 = matx::make_tensor<float>({10, 10});
136+
137+
// MatX processing code
138+
139+
// Existing code
140+
foo(matx_tensor.Data());
141+
142+
The above example shows an existing function `foo` taking in a pointer from the MatX tensor `t1`. Since only a pointer is available, all
143+
metadata available in the operator (shape, strides, etc) is not available inside of the function, and the user must ensure the correctness
144+
of usage with the pointer.
145+

0 commit comments

Comments
 (0)