Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: mismatch creates an object on a device after #3591

Open
1 task done
zkhatami opened this issue Jan 29, 2025 · 3 comments
Open
1 task done

[BUG]: mismatch creates an object on a device after #3591

zkhatami opened this issue Jan 29, 2025 · 3 comments
Assignees
Labels
bug Something isn't working right.

Comments

@zkhatami
Copy link

Is this a duplicate?

Type of Bug

Compile-time Error

Component

Thrust

Describe the bug

For this example:

#include <execution>
#include <algorithm>

template <typename T>
class Wrapper
{
  public:
    Wrapper()
    {
        ++my_count;
    }

    bool
    operator==(const Wrapper& input) const
    {
        return true;
    }

    ~Wrapper()
    {
        --my_count;
    }

  private:
    static std::atomic<size_t> my_count;
    T dummy;
};


template <typename T>
void
test()
{
  std::vector<T> input1;
  std::vector<T> input2;
  const auto res1 = std::mismatch(std::execution::par, input1.begin(), input1.end(), input2.begin(), input2.end(), std::equal_to<T>());
}

int32_t
main()
{
    test<Wrapper<int32_t>>();
}

The NVC++ stdpar fails with:

"test.cpp", line 22: error: global or namespace scope variables such as "Wrapper<T>::my_count [with T=int32_t]" (declared at line 26) cannot be accessed from device code
            function "Wrapper<T>::~Wrapper [with T=int32_t]" is implicitly a device function because it is called from device function "cuda::std::__4::__tuple_leaf<_Ip, _Hp, cuda::std::__4::__tuple_leaf_specialization::__default>::~__tuple_leaf [with _Ip=0UL, _Hp=Wrapper<int32_t>]" (declared implicitly)
          --my_count;
            ^

This is a new failure that's being observed after this commit 35df3a9.
The mismatch algorithm should only inspect existing elements of a container, not create new elements. In this test, since Wrapper has a constructor/destructor accessing static storage object, it gets created on the device after this commitL and causes NVC++ stdpar to fail.
It looks like the recent change in this header file mismatch.h in the commit 35df3a9 could be potentially causing the creation of a cuda::std::tuple object, where one of the types inside the tuple is Wrapper<int32_t> in the smaller test case.

How to Reproduce

nvc++ -stdpar -Ofast --c++17 -c test.cpp

Expected behavior

"test.cpp", line 22: error: global or namespace scope variables such as "Wrapper::my_count [with T=int32_t]" (declared at line 26) cannot be accessed from device code
function "Wrapper::Wrapper [with T=int32_t]" is implicitly a device function because it is called from device function "cuda::std::__4::__tuple_leaf<_Ip, _Hp, cuda::std::__4::__tuple_leaf_specialization::__default>::__tuple_leaf [with _Ip=0UL, _Hp=Wrapper<int32_t>]" (declared implicitly)
--my_count;
^

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

@zkhatami zkhatami added the bug Something isn't working right. label Jan 29, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL Jan 29, 2025
@bernhardmgruber bernhardmgruber self-assigned this Jan 30, 2025
@bernhardmgruber
Copy link
Contributor

bernhardmgruber commented Jan 30, 2025

I can reproduce the issue in a slightly different form (calling thrust::mismatch and compiling with nvcc), but the issue persists even if I revert 35df3a9 :/ In order to get this solved we need a larger re-engineering of thrust iterators to never pull a copy or construct a temporary anywhere.

@bernhardmgruber
Copy link
Contributor

Using nvc++ 25.1 and your test.cpp from above inside a CCCL git clone of the current main (a1a73a8):

bgruber@concorde:~/dev/cccl$ /opt/nvidia/hpc_sdk/Linux_x86_64/25.1/compilers/bin/nvc++ -stdpar -Ofast --c++17 -c test.cpp -Ithrust -Ilibcudacxx/include -Icub
bgruber@concorde:~/dev/cccl$ 

shows successful compilation.

Using nvc++ 24.5, I got:

bgruber@concorde:~/dev/cccl$ /opt/nvidia/hpc_sdk/Linux_x86_64/2024/compilers/bin/nvc++ -stdpar -Ofast --c++17 -c test.cpp -Ithrust -Ilibcudacxx/include -Icub
NVC++-F-0000-Internal compiler error. size of unknown type       0  (bug.cpp)
NVC++/x86-64 Linux 24.5-1: compilation aborted

Can you please give me the exact CCCL git commit that you used to encounter this error?

@zkhatami
Copy link
Author

zkhatami commented Jan 30, 2025

I'm using nvc++ dev version, that tracks the CCCL main branch. In contrast, nvc++ release versions use CCCL release branches, so they don’t include this commit.

The exact CCCL commit causing the failure is:

35df3a9f79a21c65f3e317a0ab6d0fa8c491793f

Before this commit, everything worked fine with:

cebb54c232216f260d63db00352ae90b28fdfc72

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working right.
Projects
Status: In Review
Development

No branches or pull requests

2 participants