forked from paboyle/Grid
-
Notifications
You must be signed in to change notification settings - Fork 0
25 October 2024
Ilektra Christidi edited this page Oct 25, 2024
·
9 revisions
To ask:
- Why does the documentation say it used only managed memory, when the code actually uses both managed and non-managed?
- Docs out of date, don't worry about it. Non-managed is/should be used where needed.
- From profile:
- To only profile kernels of interest, out of the gazillion
ApplyKernel
s: first run withnsys
, note after which kernel invocation it starts being interested, and only profile those withncu
- What are the small cuda kernels with lots of small data transfers called between the big kernel calls? Are those the ones that we want to optimise or just the big kernel?
- To only profile kernels of interest, out of the gazillion
- What to do with the tests that don't run: modify build to not build them with the wrong options, merge it to upstream....?
- If you enable back mobius, tests build, so do that (still use Ed's patch). Also pull latest
develop
from GRID repo. - Some tests fail, differently on MG's machine and tursa - on CPU only. Look into more detail on tursa once we get some CPU nodes (see below), and report any tests that fail but are not relevant to us upstream.
- Also look at the teamcity CI. Login as guest.
-
TODO: MG to run
make check
on GPU as well
- If you enable back mobius, tests build, so do that (still use Ed's patch). Also pull latest
-
What are the correct config + execution options for benchmarking (MPI partitioning, number of threads...
--grid
and--mpi
)?-
--grid
= number of sites in each dimension.--mpi
= number of MPI ranks per dimension
-
- How does the threading work?
- It's in a macro in threads.h. Can't have a
#pragma
in a macro, so cannot grep for it... How to use: specifyOMP_NUMBER_OF_THREADS
before run. - We definitely need threading on. On CPU node: 1MPI rank/chiplet = 8 ranks/node, with 16 threads/rank = 128 cores/node.
- It's in a macro in threads.h. Can't have a
- tursa is GPU only - we're burning GPU time to look at CPUs!
- TODO: IC to ask for separate project code for CPU hours for this RSE project (as many as they'd give us, at least 10k CPU hours. We'd burn them quickly, since you get a whole node at a time, which has 128 cores)
- If that fails, do CPU studies/development on CSD3 or dial3.
- What's the feeling about introducing some linting over the code (clang-format)?
- NOOOO! Peter will never speak to you again!
- Automated testing: it looks like there are some automated unit tests, but the end-to-end tests are more like examples - they create the output and nothing else, the user would have to check it manually. We'd like to have some automated regression tests. Do we want to do this locally only in the Sp2n folder/tests that we care about, or make something more general to contribute to upstream Grid?
- Look at
Grid/util/FlightRecorder.h/cc
for regression test facilities (used in unit tests) - most likely not appropriate... - TODO: EB will talk to Ryan (RSE/postdoc working on GRID at Edinburgh) to talk to Peter about how acceptable a contribution would be. But most likely, have our own script (or other regression testing system that doesn't affect the rest of GRID) inside the test folder(s) of interest.
- Look at