-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use mixin class for osu #222
base: main
Are you sure you want to change the base?
Conversation
from eessi.testsuite.utils import find_modules, log | ||
|
||
|
||
def filter_scales_pt2pt(): | ||
""" | ||
Filtering function for filtering scales for the pt2pt OSU test | ||
returns all scales with either 2 cores, 1 full node, or 2 full nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit surprised that it runs on 2 cores, full node and 2 full nodes, but not e.g. on 2 GPUs (which on Snellius happens to be 1_2_node). This was already the case before your changes here, so it's not introduced in this PR, but maybe @satishskamath can comment on why this is the case? I do seem to remember him saying he thought exclusive nodes would be good for reproducibility for these tests, but according to that argument also the 2-cores size would have been filtered - I'd say having 2 GPUs is the equivalent .
Now, I do think it is difficult to filter the GPUs down to 2-GPU setups, as you don't know this until after the setup phase. I.e. we'd basically have to accept everything single-node in the init phase, and then skip the ones that would provide more than 2 GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix this in a follow-up PR, see #224
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: i decided to add it to this PR anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial attempt on this and the reason I kept 2_cores
and 1cpn_2nodes
case for GPUs for point to point tests was because these cases were valid and one needs at least one host CPU to assign to a device. But these scales are mainly for CPUs therefore can be a bit misleading and I do lean towards the fact that network based tests are performed the best using two full exclusive nodes or 1 full node which is the objective of this test as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested, runs fine. My main question would be the required_mem_per_node
remark, otherwise, I think this is ready to be merged.
@@ -157,3 +184,9 @@ def set_compute_unit(self): | |||
DEVICE_TYPES[GPU]: COMPUTE_UNIT[GPU], | |||
} | |||
self.compute_unit = device_to_compute_unit.get(self.device_type) | |||
|
|||
@run_after('setup') | |||
def skip_test_1gpu(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we agree that if it's a one node test, we would check if we had exactly 2 GPUs and skip otherwise? I mean, the downside is that for nodes with very weird GPU counts (3, or 5), none of the standard scales would have this. The downside of the current situations is that I now essentially run the same test twice on a 4-GPU node: once at the 1_node
scale, and once at the 1_2_node
scale.
On the other hand, we also discussed the value of exclusively allocating, so that means we would also not skip 1_node in any case. So I guess the only nodes where this would make a difference is if you have 8 gpus per node. In that case, you might want to run 1_node and 1_4_node, but skip 1_2_node.
Well, let me know what you think. It might also be too messy to implement, in which case I'm happy to keep the current setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this comment is on the collective test, but you probably meant to add it to the pt2pt test ;))
yes, i had the same thoughts, and i agree neither choice is ideal.
i am now thinking that maybe we should not include the 1_node
and instead leave it up to the user to request an exclusive node (with sbatch --exclusive
) if they want a full node.
there is also the question of what we should do with the 2_nodes
scale. a solution could be to introduce extra scales 1_gpu
, 2_gpus
, and 1gpn_2nodes
(that we may want to keep out of the default SCALES
list?), which automatically sets the number of cores so that the default number of cores per gpu is respected based on the info in the site config file and autodetection. this solution would also avoid having to add the skip_if
statements because we can then select only the scales with 2 gpus.
what do you think of this solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what the impact will be on existing tests.
I.e. suppose you have a GPU test that just runs at any scale (say: GROMACS). Having both the scales 1_4_node
and 1_gpu
will generate two essentially identical test instances on a partition with 4 GPUs per node. This means that as an end-user, you pretty much have to select a set of tags. Whereas right now, it is completely reasonable to run the test suite without any tags, if you just want to run anything. I would like to keep that behavior tbh: if you just want to run everything (once), just run reframe without tags. I mean, that's essentially why I was complaining about the fact that this would run 2 essentially identical test instances in the first case.
Similarly, suppose you have CPU tests. You then have to filter all of those gpu-related scales. That makes a straightforward case (a CPU test that runs at any scale) suddenly more complex, since every CPU test would need to implement some filtering (though we could probably define a filtered list in constants.py
as well).
I don't know, maybe we should just leave it as you implemented it now, I feel the cure is getting worse than the disease :). Sure, the 1_2_node and 1_node scales are nearly identical on a 4-GPU node (with the exception that the second is, indeed, exclusive, which has some value at least). So I guess only on an 8-GPU node would you really get multiple tests that provide zero extra information (1_4_node and 1_2_node would be 'identical' there: only using 2 GPUs, but no exclusive node in either case).
I found an issue with the OSU test that I didn't realize was there before. It does NOT pop up if you have the
Now, it makes total sense to use
and set
in order to make sure the right amount of tasks is launched per node. So, I'm a bit at a loss where we messed this up. Should we somehow have this test set the |
Note that it is not just important for the
in order for those GPUs to even be requested by the job... otherwise, you'd get a CPU-only allocation on a GPU node. Even more accurate might be:
For collectives, there is no problem, because there we map the |
Ok, this doesn't account for the fact that the node could have only 1 GPU per node. We should do:
|
Note: I made a PR to your feature branch @smoors |
@casparvl In the version in the main branch, I used to set this in the test itself. |
EESSI_Mixin class:
@run_before('setup', always_last=True)
to avoidperf_variables
being overwritten by the hpctestlibnum_tasks_per_compute_unit
OSU test:
pt2pt
GPU test and skip the ones that don't have exactly 2 GPUsfixes #145