Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynch Kernel #13

Open
crtrott opened this issue Apr 26, 2024 · 0 comments
Open

Asynch Kernel #13

crtrott opened this issue Apr 26, 2024 · 0 comments

Comments

@crtrott
Copy link
Member

crtrott commented Apr 26, 2024

std::thread io_thread([=]() { do_my_io(); });
Kokkos::parallel_for(N, functor1);
Kokkos::deep_copy(host, device);
MPI(host)
Kokkos::parallel_for(N, functor2);
io_thread.join();
  • You need more concurrency -> parallelize within a point
  • You probably should launch elements with same number of points together in a kernel
  • You need to look into templating on number of points -> reduce cost of accessing element
std::array<int,5> num_elements{n1,n2,n3,n4,n5};
std::array<int,5> team_size{1,8,27,64,125};

for(int size = 0; size<5; size++) {
  int vector_size = // depends on kernel - how much concurrency per point  
  // maybe not do this for team_size 1? or group multiple team_size 1 things together
  // potentially use multiple Kokkos instances (partition_instance) i.e. CUDA streams, one per size
  parallel_for(TeamPolicy(num_elements[size], team_size[size], vector_size), KOKKOS_LAMBDA(const team_handle_type& team) {
     int element = element_map(size, team.league_rank());
     parallel_for(TeamThreadMDRange(team, size+1, size+1, size+1), [&](int i0, int i1, int i2) {
        parallel_for(ThreadVectorRange(team, ConcurrencyPerPoint), [&](int k) {
             elements(element).data(i0,i1,i2,k) = ...
        });
        parallel_for(ThreadVectorRange(team, ConcurrencyPerPoint), [&](int k) {
           elements(element).data(i0,i1,i2,k) = ...
        });
        parallel_for(ThreadVectorRange(team, ConcurrencyPerPoint), [&](int k) {
             elements(element).data(i0,i1,i2,k) = ...
        });
     });
  });
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant