Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does "local_accessor" really have common reference semantics? #552

Open
gmlueck opened this issue May 8, 2024 · 1 comment
Open

Does "local_accessor" really have common reference semantics? #552

gmlueck opened this issue May 8, 2024 · 1 comment

Comments

@gmlueck
Copy link
Contributor

gmlueck commented May 8, 2024

We state in section 4.5.2 that local_accessor has common reference semantics, which means that a copy of a local_accessor object:

must behave as-if it were the original instance and as-if any action performed on it were also performed on the original instance

If that were true, I would expect two local_accessor objects that are copies to reference the same underlying local memory. This is not how DPC++ behaves currently, and I wonder if we really intended this to be the case. Consider the following test:

#include <iostream>
#include <sycl/sycl.hpp>

static constexpr size_t SIZE = 16;
static constexpr size_t WGSIZE = 4;

int main() {
  sycl::queue q;
  sycl::buffer<size_t> buf{{SIZE}};

  q.submit([&](sycl::handler &cgh) {
    sycl::accessor acc{buf, cgh};
    sycl::local_accessor<size_t> lacc1{{WGSIZE}, cgh};
#ifdef COPY_IN_HOST
    sycl::local_accessor<size_t> lacc2{lacc1};
#endif

    sycl::nd_range ndr{{SIZE}, {WGSIZE}};
    cgh.parallel_for(ndr, [=](sycl::nd_item<> it) {
#ifndef COPY_IN_HOST
      sycl::local_accessor<size_t> lacc2{lacc1};
#endif
      size_t lid = it.get_local_linear_id();
      size_t gid = it.get_global_linear_id();
      lacc1[lid] = gid;

      // Should this work?  If "lacc1" and "lacc2" have common reference
      // semantics, then they should both point to the same memory, and you
      // would expect this to read the same memory as "lacc1[lid]".
      acc[gid] = lacc2[lid];
    });
  });

  bool ok = true;
  sycl::host_accessor ha{buf};
  for (int i = 0;  i < SIZE;  i++) {
    if (ha[i] != i) {
      std::cout << "Error on element " << i << ": " << ha[i] << "\n";
      ok = false;
    }
  }
  if (ok) {
    std::cout << "Calculation OK\n";
  }
  return (ok) ? 0 : 1;
}

When the local_accessor object is copied in host code (command group scope), the test fails:

$ clang++ -fsycl -DCOPY_IN_HOST -o local-accessor-copy local-accessor-copy.cpp
$ ./local-accessor-copy
Error on element 1: 0
Error on element 2: 0
Error on element 3: 0
Error on element 4: 0
Error on element 5: 0
Error on element 6: 0
Error on element 7: 0
Error on element 8: 0
Error on element 9: 0
Error on element 10: 0
Error on element 11: 0
Error on element 12: 0
Error on element 13: 0
Error on element 14: 0
Error on element 15: 0

However, when the local_accessor object is copied inside the kernel, the test passes:

$ clang++ -fsycl -UCOPY_IN_HOST -o local-accessor-copy local-accessor-copy.cpp
$ ./local-accessor-copy
Calculation OK

When you understand the implementation, this behavior makes sense. Each local accessor kernel argument creates its own unique SLM pointer inside the kernel, thus the test fails when the copy happens on the host. However, inside the kernel, the copy just copies the underlying SLM pointer, so the copies alias the same memory. Thus, the test passes when the copy happens inside the kernel.

However, the spec doesn't say this is the expected behavior. Furthermore, this behavior likely seems bizarre to someone who does not understand the implementation.

It would be tempting to sidestep the issue and just delete the copy constructor and copy assignment operator from local_accessor. However, I fear that existing applications may pass local_accessor by value to function calls, and this wouldn't work if we delete these.

Note that there is no similar issue with regular accessor. When you copy an accessor object on the host, the copy points to the same buffer, and the common reference semantics are upheld.

@gmlueck gmlueck added the Agenda To be discussed during a SYCL committee meeting label May 22, 2024
@illuhad
Copy link
Contributor

illuhad commented Jul 11, 2024

If that were true, I would expect two local_accessor objects that are copies to reference the same underlying local memory. This is not how DPC++ behaves currently, and I wonder if we really intended this to be the case. Consider the following test:

I did not yet run your test case, but my expectation that AdaptiveCpp honors the reference semantics in that case. So I believe that common reference semantics are implementable here. What AdaptiveCpp does is that it manages a single local memory allocation for all local accessors, and the local accessor only stores offsets into that memory. So when you copy an accessor, only the offset is copied, but the offset is still applied to the same local memory allocation.

EDIT: Yes, works with AdaptiveCpp

$ acpp -o la -DCOPY_IN_HOST la.cpp
acpp warning: No optimization flag was given, optimizations are disabled by default. Performance may be degraded. Compile with e.g. -O2/-O3 to enable optimizations.
$ ACPP_VISIBILITY_MASK="omp;cuda" ./la
[AdaptiveCpp Warning] dag_direct_scheduler: Detected a requirement that is neither of discard access mode (SYCL 1.2.1) nor no_init property (SYCL 2020) that accesses uninitialized data. Consider changing to discard/no_init. Optimizing potential data transfers away.
'+ptx86' is not a recognized feature for this target (ignoring feature)
'+ptx86' is not a recognized feature for this target (ignoring feature)
'+ptx86' is not a recognized feature for this target (ignoring feature)
[AdaptiveCpp Warning] kernel_cache: This application run has resulted in new binaries being JIT-compiled. This indicates that the runtime optimization process has not yet reached peak performance. You may want to run the application again until this warning no longer appears to achieve optimal performance.
Calculation OK
$ acpp -o la -UCOPY_IN_HOST la.cpp
acpp warning: No optimization flag was given, optimizations are disabled by default. Performance may be degraded. Compile with e.g. -O2/-O3 to enable optimizations.
$ ACPP_VISIBILITY_MASK="omp;cuda" ./la
[AdaptiveCpp Warning] dag_direct_scheduler: Detected a requirement that is neither of discard access mode (SYCL 1.2.1) nor no_init property (SYCL 2020) that accesses uninitialized data. Consider changing to discard/no_init. Optimizing potential data transfers away.
'+ptx86' is not a recognized feature for this target (ignoring feature)
'+ptx86' is not a recognized feature for this target (ignoring feature)
'+ptx86' is not a recognized feature for this target (ignoring feature)
[AdaptiveCpp Warning] kernel_cache: This application run has resulted in new binaries being JIT-compiled. This indicates that the runtime optimization process has not yet reached peak performance. You may want to run the application again until this warning no longer appears to achieve optimal performance.
Calculation OK

@gmlueck gmlueck removed the Agenda To be discussed during a SYCL committee meeting label Jul 31, 2024
sarnex pushed a commit to intel/llvm that referenced this issue Oct 22, 2024
Implement work group memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_work_group_memory.asciidoc
Two notes:

- Free function kernel support for work group memory argument will be
added in a future PR.
- When the assignment operator is called in host code, the assigned to
work group memory object does not actually correspond to the same
underlying memory as the one that was assigned from contradicting the
spec. See KhronosGroup/SYCL-Docs#552 for a
similar problem with `local_accessor`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants