-
Notifications
You must be signed in to change notification settings - Fork 0
13 december 2024
-
We are very glad that smaller PRs will be the preferred method for contributing upstream. To make sure we do this correctly, can we clarify the part about "They should include a comparison of timings before and after, and the hardware specification that the comparison was made on."? Can we decide of a specific test cases / problem size / run parameters to make sure this is meaningful? Do we need CPU and GPU cases? MPI? Multi-node / single node, multi-gpu / single gpu? If there are standard cases that would help make the PR easier to accept that would be very helpful for everyone.
-
GPU optimisation of
update_field
- following up on Slack discussion and comment on issue #12. Lets discuss if this is a sensible approach, what the end goal should look like etc. -
There are a lot of data movements in the pipeline which are not clear to me. These are not yet documented but I can show them in the latest profiling attempts with NVTX (will be updated by the meeting).