|
| 1 | +.. _devexternal: |
| 2 | + |
| 3 | +Interfacing With External Code and Libraries |
| 4 | +############################################ |
| 5 | + |
| 6 | +Existing host and CUDA can can interoperate seamlessly with MatX both by using MatX primitives in existing code, |
| 7 | +and transferring MatX data into other libraries. Integrating MatX into existing code is a common use case that |
| 8 | +allows developers to incrementally port code into MatX without having to rewrite everything at once. |
| 9 | + |
| 10 | +This guide is not intended for developers who wish to extend MatX See :ref:`_devguide` for the MatX developer guide. |
| 11 | + |
| 12 | + |
| 13 | +Passing Existing Pointers to MatX |
| 14 | +--------------------------------- |
| 15 | + |
| 16 | +To use MatX in existing code, the pointers (whether host or device) are passed into the `make_tensor` call as the |
| 17 | +first parameter: |
| 18 | + |
| 19 | +.. code-block:: cpp |
| 20 | +
|
| 21 | + // Existing code |
| 22 | + float *my_data_ptr; |
| 23 | + cudaMalloc((void*)&my_data_ptr, 100 * sizeof(float)); // Treated as a 10x10 float matrix in the code |
| 24 | + foo(my_data_ptr); // Call existing function that uses my_data_ptr |
| 25 | +
|
| 26 | + // Work with my_data_ptr on the device |
| 27 | +
|
| 28 | + // End of existing code. Convert to MatX tensor |
| 29 | + auto matx_tensor = matx::make_tensor<float>(my_data_ptr, {10, 10}); |
| 30 | +
|
| 31 | + // MatX functions |
| 32 | +
|
| 33 | +In the code above the developer has an existing device pointer that they used in their CUDA code. It's common in existing |
| 34 | +CUDA code to see linear allocations like the one above, but the developer treats it as a higher-dimension tensor in the code. |
| 35 | +For this example `my_data_ptr` was allocated with linear memory holding 100 floats, but the user treats it later as a 10x10 matrix. |
| 36 | + |
| 37 | +Since MatX needs to know the shape of the tensor when it's created, we explictly pass the `{10, 10}` shape into the |
| 38 | +`make_tensor` call. |
| 39 | + |
| 40 | +By default MatX will not take ownership of the pointer; the user is responsible for freeing the memory when they are done with it. |
| 41 | +This is true of all `make_tensor` calls that take an existing pointer as an argument since the user typically has their own |
| 42 | +memory management outside of MatX. The last parameter of each `make_tensor` call is a boolean named `owning` that tells MatX to |
| 43 | +take ownership, and defaults to *false*. By setting `owning` to *true*, MatX will free the memory when the tensor goes out of scope. |
| 44 | +By default it uses its own allocator, but users can pass in their own PMR-compatible allocator if they wish. For more information |
| 45 | +see :ref:`_creating`. |
| 46 | + |
| 47 | +Passing MatX Operators to External Code/Libraries |
| 48 | +------------------------------------------------- |
| 49 | + |
| 50 | +MatX operators can be passed to external code or libraries in two ways: by object or by pointer. Passing MatX operators by object is |
| 51 | +the preferred way when possible. Doing so maintains all of the internal information and state that is contained in the operator and |
| 52 | +reduces the chances of errors. |
| 53 | + |
| 54 | +Sometimes code cannot be modified to allow for passing by object. This is common when working with libraries that have API that |
| 55 | +cannot be changed easily, or if the overhead of passing by value is too large. MatX also allows developers to extract the pointer |
| 56 | +from a MatX operator and pass it to external code by using the `Data()` method of a tensor. Note that unlike the "pass-by-object" method, |
| 57 | +this method only works for tensors since general operators do not have a data pointer. |
| 58 | + |
| 59 | +Care must be taken when passing either operators or pointers to existing code to avoid bugs: |
| 60 | + |
| 61 | +* The data is only valid for the lifetime of the tensor. If the tensor goes out of scope, the data backing the tensor is invalid. For |
| 62 | + example, if a CUDA kernel is called asynchronously with a tensor as a parameter, then the tensor goes out of scope while the kernel |
| 63 | + runs, the results are undefined. |
| 64 | +* The *kind* of the pointer must be known to the external code. For example, if the tensor was created in device memory, the external |
| 65 | + code must access it only where device memory is accessible. |
| 66 | + |
| 67 | +If the external code supports the *dlpack* standard, the tensor's `ToDLPack()` method can be used instead to get a `DLManagedTensor` object. |
| 68 | +This method is much safer since all shape and ownership can be transferred. |
| 69 | + |
| 70 | + |
| 71 | +Passing By Object |
| 72 | +================= |
| 73 | + |
| 74 | +Passing by object makes all of the object's metadata available inside of an external function. Since operator types can be very complex, it's |
| 75 | +always recommended to pass the operator as a template parameter rather than specifying the type of the operator: |
| 76 | + |
| 77 | +.. code-block:: cpp |
| 78 | +
|
| 79 | + template <typename Op> |
| 80 | + void foo(Op &op) |
| 81 | + { |
| 82 | + // Do something with the operator |
| 83 | + auto val = op(10, 1); |
| 84 | + } |
| 85 | +
|
| 86 | + template <typename Op> |
| 87 | + __global__ void foo_kernel(Op op) |
| 88 | + { |
| 89 | + // Do something with the operator |
| 90 | + auto val = op(10, 1); |
| 91 | + } |
| 92 | +
|
| 93 | + // Create a MatX operator |
| 94 | + auto t1 = matx::make_tensor<float>({10, 10}); |
| 95 | + auto t2 = matx::make_tensor<float>({10, 10}); |
| 96 | + auto o1 = (t1 + t2) * 2.0f; |
| 97 | +
|
| 98 | + foo(o1); |
| 99 | +
|
| 100 | + typename matx::detail::base_type_t<decltype(o1)> o1_base; |
| 101 | + foo_kernel<<<1,1>>>(o1_base); |
| 102 | +
|
| 103 | +The first function `foo` is a host function that takes a MatX operator as a template parameter by reference, while `foo_kernel` is |
| 104 | +a CUDA kernel that takes the operator by value. When passing an operator to a CUDA kernel it should always be passed by value |
| 105 | +unless the operator's memory is accessible on the device. The template parameter allows the user to pass any operator to the |
| 106 | +function that adheres to the operator interface. This is a powerful concept that reduces the need for code changes if the type |
| 107 | +of the operator changes. For example, changing the `o1` statment to `t1 - t2` would change the type of the operator, but using |
| 108 | +templates allows the same code to exist in `foo` without changing the type. |
| 109 | + |
| 110 | +For more information about the *operator interface*. see :ref:`_concepts`. |
| 111 | + |
| 112 | +Inside of both `foo` and `foo_kernel` all functions in the *operator interface* are available. `op(10, 1)` will return the value |
| 113 | +at the 11th row and 2nd column of the operator (0-based). Using `operator()` inside of the operator will handle all the indexing |
| 114 | +logic to handle the shape and stride of the operator. |
| 115 | + |
| 116 | +The last part to mention in the code is the declaration of `o1_base`. Some operator types in MatX, such as a `tensor_t`, cannot |
| 117 | +be passed directly to a CUDA kernel due to internal types that cannot be used on the device. The `base_type_t` type trait will |
| 118 | +convert the operator to a type that can be used on the device if needed, or it will return the same type if it's already usable |
| 119 | +on the device. |
| 120 | + |
| 121 | +Passing By Pointer |
| 122 | +================== |
| 123 | + |
| 124 | +In the code above `t1` and `t2` could have their pointers extracted, but `o1` could not. For that reason, passing raw pointers |
| 125 | +can only be used on tensors and not other operators. |
| 126 | + |
| 127 | +.. code-block:: cpp |
| 128 | +
|
| 129 | + #include <matx.h> |
| 130 | +
|
| 131 | + // Existing function |
| 132 | + void foo(float *data); |
| 133 | +
|
| 134 | + // Create a MatX tensor in managed memory |
| 135 | + auto t1 = matx::make_tensor<float>({10, 10}); |
| 136 | +
|
| 137 | + // MatX processing code |
| 138 | +
|
| 139 | + // Existing code |
| 140 | + foo(matx_tensor.Data()); |
| 141 | +
|
| 142 | +The above example shows an existing function `foo` taking in a pointer from the MatX tensor `t1`. Since only a pointer is available, all |
| 143 | +metadata available in the operator (shape, strides, etc) is not available inside of the function, and the user must ensure the correctness |
| 144 | +of usage with the pointer. |
| 145 | + |
0 commit comments