Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime Error When Mixing Runtime and Driver APIs Due to Context Management #306

Open
NiclasEsser1 opened this issue Nov 18, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@NiclasEsser1
Copy link

NiclasEsser1 commented Nov 18, 2024

Description

I am encountering an issue when mixing CUDA's Runtime API and Driver API within my project. While my project uses the Runtime API exclusively, I depend on an external library that uses cudawrappers and thus requires the Driver API. The issue arises when a Driver API context is created after the primary runtime context is already initialized.

When the Driver API context goes out of scope, subsequent Runtime API calls result in runtime errors. Specifically, I encounter the following error:

terminate called after throwing an instance of 'thrust::THRUST_200500_890_NS::system::system_error'
  what():  CUDA free failed: cudaErrorInvalidValue: invalid argument
Aborted (core dumped)

Steps to Reproduce
The following minimal reproducible example demonstrates the issue:

#include <cuda.h>
#include <cudawrappers/cu.hpp>
#include <thrust/device_vector.h>


int main() {
    cu::init();
    auto vec = std::make_unique<thrust::device_vector<int>>(); // First call to the runtime API -> creates primary
    auto device = std::make_unique<cu::Device>(0);
    {
        // auto context = device->primaryCtxRetain();
        cu::Context context(0, *device);
        context.setCurrent();
        vec->resize(100);
    } // context goes out of scope here
    device.reset();
    vec.reset(); // Fails with runtime error
}

Observations

Currently, it appears there is no way to instruct cudawrappers to utilize the primary context created by the Runtime API. However, I noticed the existence of the cu::Device::primaryCtxRetain() method, which seems like a potential solution. Unfortunately, it is not implemented in the current version of cudawrappers.

Solution
To solve the issue, I've implemented primaryCtxRetain as follows:

Context Device::primaryCtxRetain()
{
#if !defined(__HIP__)
    CUcontext primary;
    checkCudaCall(cuDevicePrimaryCtxRetain(&primary, _obj));
    return {primary, *this}; // Call to the private Context constructor -> friended to cu::Device 
#endif
}  

Note: The implementation must be done after the cu::Context class. The primaryCtxRetain() declaration must also be inlined in the `cu::Device' class.

With this implementation, the following adjusted example works without any errors:

int main() {
    cu::init();
    auto vec = std::make_unique<thrust::device_vector<int>>();
    auto device = std::make_unique<cu::Device>(0);
    {
        auto context = device->primaryCtxRetain();
        vec->resize(100);
    } // context goes out of scope here
    device.reset();
    vec.reset(); // Does not fail anymore
}

Questions

  1. Why is the primaryCtxRetain() method not implemented in cudawrappers?
  2. Is there an alternative solution already available within cudawrappers that I might have missed?
  3. If there is no existing solution, would it be possible to integrate the primaryCtxRetain() method into the library? (I can open a pull request from my fork)
@NiclasEsser1 NiclasEsser1 added the enhancement New feature or request label Nov 18, 2024
@NiclasEsser1 NiclasEsser1 changed the title Incompatibility with Runtime API Primary Context Runtime Error When Mixing Runtime and Driver APIs Due to Context Management Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant