13 december 2024

Jump to bottom

Mashy Green edited this page Dec 11, 2024 · 2 revisions

To ask:

We are very glad that smaller PRs will be the preferred method for contributing upstream. To make sure we do this correctly, can we clarify the part about "They should include a comparison of timings before and after, and the hardware specification that the comparison was made on."? Can we decide of a specific test cases / problem size / run parameters to make sure this is meaningful? Do we need CPU and GPU cases? MPI? Multi-node / single node, multi-gpu / single gpu? If there are standard cases that would help make the PR easier to accept that would be very helpful for everyone.
GPU optimisation of update_field - following up on Slack discussion and comment on issue #12. Lets discuss if this is a sensible approach, what the end goal should look like etc.
There are a lot of data movements in the pipeline which are not clear to me. These are not yet documented but I can show them in the latest profiling attempts with NVTX (will be updated by the meeting).