Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use mixin class for osu #222

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

use mixin class for osu #222

wants to merge 7 commits into from

Conversation

smoors
Copy link
Collaborator

@smoors smoors commented Dec 30, 2024

EESSI_Mixin class:

  • change measure_mem_usage() to @run_before('setup', always_last=True) to avoid perf_variables being overwritten by the hpctestlib
  • set default num_tasks_per_compute_unit

OSU test:

  • use mixin class
  • create base class to minimize code duplication
  • reduce required memory per node to 1GB per task
  • add partial node scales for the pt2pt GPU test and skip the ones that don't have exactly 2 GPUs

fixes #145

@smoors smoors added this to the v0.4.1 milestone Dec 30, 2024
from eessi.testsuite.utils import find_modules, log


def filter_scales_pt2pt():
"""
Filtering function for filtering scales for the pt2pt OSU test
returns all scales with either 2 cores, 1 full node, or 2 full nodes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit surprised that it runs on 2 cores, full node and 2 full nodes, but not e.g. on 2 GPUs (which on Snellius happens to be 1_2_node). This was already the case before your changes here, so it's not introduced in this PR, but maybe @satishskamath can comment on why this is the case? I do seem to remember him saying he thought exclusive nodes would be good for reproducibility for these tests, but according to that argument also the 2-cores size would have been filtered - I'd say having 2 GPUs is the equivalent .

Now, I do think it is difficult to filter the GPUs down to 2-GPU setups, as you don't know this until after the setup phase. I.e. we'd basically have to accept everything single-node in the init phase, and then skip the ones that would provide more than 2 GPUs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this in a follow-up PR, see #224

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: i decided to add it to this PR anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial attempt on this and the reason I kept 2_cores and 1cpn_2nodes case for GPUs for point to point tests was because these cases were valid and one needs at least one host CPU to assign to a device. But these scales are mainly for CPUs therefore can be a bit misleading and I do lean towards the fact that network based tests are performed the best using two full exclusive nodes or 1 full node which is the objective of this test as well.

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, runs fine. My main question would be the required_mem_per_node remark, otherwise, I think this is ready to be merged.

@@ -157,3 +184,9 @@ def set_compute_unit(self):
DEVICE_TYPES[GPU]: COMPUTE_UNIT[GPU],
}
self.compute_unit = device_to_compute_unit.get(self.device_type)

@run_after('setup')
def skip_test_1gpu(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we agree that if it's a one node test, we would check if we had exactly 2 GPUs and skip otherwise? I mean, the downside is that for nodes with very weird GPU counts (3, or 5), none of the standard scales would have this. The downside of the current situations is that I now essentially run the same test twice on a 4-GPU node: once at the 1_node scale, and once at the 1_2_node scale.

On the other hand, we also discussed the value of exclusively allocating, so that means we would also not skip 1_node in any case. So I guess the only nodes where this would make a difference is if you have 8 gpus per node. In that case, you might want to run 1_node and 1_4_node, but skip 1_2_node.

Well, let me know what you think. It might also be too messy to implement, in which case I'm happy to keep the current setup.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this comment is on the collective test, but you probably meant to add it to the pt2pt test ;))

yes, i had the same thoughts, and i agree neither choice is ideal.

i am now thinking that maybe we should not include the 1_node and instead leave it up to the user to request an exclusive node (with sbatch --exclusive) if they want a full node.

there is also the question of what we should do with the 2_nodes scale. a solution could be to introduce extra scales 1_gpu, 2_gpus, and 1gpn_2nodes (that we may want to keep out of the default SCALES list?), which automatically sets the number of cores so that the default number of cores per gpu is respected based on the info in the site config file and autodetection. this solution would also avoid having to add the skip_if statements because we can then select only the scales with 2 gpus.

what do you think of this solution?

Copy link
Collaborator

@casparvl casparvl Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what the impact will be on existing tests.

I.e. suppose you have a GPU test that just runs at any scale (say: GROMACS). Having both the scales 1_4_node and 1_gpu will generate two essentially identical test instances on a partition with 4 GPUs per node. This means that as an end-user, you pretty much have to select a set of tags. Whereas right now, it is completely reasonable to run the test suite without any tags, if you just want to run anything. I would like to keep that behavior tbh: if you just want to run everything (once), just run reframe without tags. I mean, that's essentially why I was complaining about the fact that this would run 2 essentially identical test instances in the first case.

Similarly, suppose you have CPU tests. You then have to filter all of those gpu-related scales. That makes a straightforward case (a CPU test that runs at any scale) suddenly more complex, since every CPU test would need to implement some filtering (though we could probably define a filtered list in constants.py as well).

I don't know, maybe we should just leave it as you implemented it now, I feel the cure is getting worse than the disease :). Sure, the 1_2_node and 1_node scales are nearly identical on a 4-GPU node (with the exception that the second is, indeed, exclusive, which has some value at least). So I guess only on an 8-GPU node would you really get multiple tests that provide zero extra information (1_4_node and 1_2_node would be 'identical' there: only using 2 GPUs, but no exclusive node in either case).

@casparvl
Copy link
Collaborator

I found an issue with the OSU test that I didn't realize was there before. It does NOT pop up if you have the ALWAYS_REQUEST_GPUS feature set. But if you don't, then test.num_gpus_per_node doesn't get set anywhere, and that results in this error:

[2025-01-14T11:45:49] info: reframe:   * Reason: type error: ../test-suite/eessi/testsuite/tests/apps/osu.py:151: unsupported operand type(s) for *: 'NoneType' and 'int'
        num_gpus = self.num_gpus_per_node * self.num_nodes

[2025-01-14T11:45:49] verbose: reframe: Traceback (most recent call last):
  File "/gpfs/home4/casparl/EESSI/reframe/reframe/frontend/executors/__init__.py", line 318, in _safe_call
    return fn(*args, **kwargs)
  File "/gpfs/home4/casparl/EESSI/reframe/reframe/core/hooks.py", line 111, in _fn
    getattr(obj, h.__name__)()
  File "/gpfs/home4/casparl/EESSI/reframe/reframe/core/hooks.py", line 38, in _fn
    func(*args, **kwargs)
  File "/gpfs/home4/casparl/EESSI/test-suite/eessi/testsuite/tests/apps/osu.py", line 151, in skip_test_1gpu
    num_gpus = self.num_gpus_per_node * self.num_nodes
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Now, it makes total sense to use

compute_unit = COMPUTE_UNIT[NODE]

and set

    @run_after('init')
    def set_num_tasks_per_compute_unit(self):
        """ Setting number of tasks per compute unit and cpus per task. This sets num_cpus_per_task
        for 1 node and 2 node options where the request is for full nodes."""
        if SCALES.get(self.scale).get('num_nodes') == 1:
            self.num_tasks_per_compute_unit = 2

in order to make sure the right amount of tasks is launched per node. So, I'm a bit at a loss where we messed this up. Should we somehow have this test set the test.num_gpus_per_node=test.default_num_gpus_per_node manually or something, now that the hook won't do it? Or should we consider this a 'bug' in the hook? I don't think it is a bug in the hook tbh, it had no specific reason to explicitly ask for a num_gpus_per_node, as we didn't define `ALWAYS_REQUEST_GPUS... I have the feeling it's the tests responsibility to say that 'yeah, I'm asking for node as a compute unit BUT I still want to see how many gpus per node I'm expecting to get, and skip if that's less than only 1'.

@casparvl
Copy link
Collaborator

Note that it is not just important for the skip_if: we need to set

    @run_after('setup')
    def set_gpus_per_node(self):
        self.num_gpus_per_node = self.default_num_gpus_per_node

in order for those GPUs to even be requested by the job... otherwise, you'd get a CPU-only allocation on a GPU node.

Even more accurate might be:

    @run_after('setup')
    def skip_test_1gpu(self):
        num_gpus = self.default_num_gpus_per_node * self.num_nodes
        self.skip_if(
            num_gpus != 2 and self.scale not in ['1_node', '2_nodes'],
            f"Skipping test : {num_gpus} GPU(s) available for this test case, need exactly 2"
        )

    @run_after('setup')
    def set_gpus_per_node(self):
        if self.scale not in ['1_node', '2_nodes']:
            self.num_gpus_per_node = 2  # make sure to allocate 2 GPUs
       elif self.scale in ['1_node']:
           self.num_gpus_per_node = self.default_num_gpus_per_node  # just allocate exclusively
      else:
           self.num_gpus_per_node = 1  # for multinode tests, we use 1 GPU per node

For collectives, there is no problem, because there we map the DEVICE_TYPES[GPU] to COMPUTE_UNIT[GPU], thus defining the self.num_gpus_per_node just fine.

@casparvl
Copy link
Collaborator

Ok, this doesn't account for the fact that the node could have only 1 GPU per node. We should do:

    @run_after('setup')
    def skip_test_1gpu(self):
        num_gpus = self.default_num_gpus_per_node * self.num_nodes
        # On a partial node allocation, run this test only if exactly 2 GPUs are allocated
        if self.scale not in ['1_node', '2_nodes']:
            self.skip_if(
                num_gpus != 2,
                f"Skipping test : {num_gpus} GPU(s) available for this test case, need exactly 2"
        )
        # If the scale is 1_node, make sure there are at least 2 GPUs
        elif self.scale == '1_node':
            self.skip_if(num_gpus < 2, "Skipping GPU test : only 1 GPU available for this test case")

    @run_after('setup')
    def set_gpus_per_node(self):
        if self.scale not in ['1_node', '2_nodes']:
            self.num_gpus_per_node = 2  # make sure to allocate 2 GPUs
        elif self.scale == '1_node':
            self.num_gpus_per_node = self.default_num_gpus_per_node  # just allocate exclusively
        else:
            self.num_gpus_per_node = 1  # for multinode tests, we use 1 GPU per node

@casparvl
Copy link
Collaborator

Note: I made a PR to your feature branch @smoors

@satishskamath
Copy link
Collaborator

satishskamath commented Jan 16, 2025

I found an issue with the OSU test that I didn't realize was there before. It does NOT pop up if you have the ALWAYS_REQUEST_GPUS feature set. But if you don't, then test.num_gpus_per_node doesn't get set anywhere, and that results in this error:

[2025-01-14T11:45:49] info: reframe:   * Reason: type error: ../test-suite/eessi/testsuite/tests/apps/osu.py:151: unsupported operand type(s) for *: 'NoneType' and 'int'
        num_gpus = self.num_gpus_per_node * self.num_nodes

[2025-01-14T11:45:49] verbose: reframe: Traceback (most recent call last):
  File "/gpfs/home4/casparl/EESSI/reframe/reframe/frontend/executors/__init__.py", line 318, in _safe_call
    return fn(*args, **kwargs)
  File "/gpfs/home4/casparl/EESSI/reframe/reframe/core/hooks.py", line 111, in _fn
    getattr(obj, h.__name__)()
  File "/gpfs/home4/casparl/EESSI/reframe/reframe/core/hooks.py", line 38, in _fn
    func(*args, **kwargs)
  File "/gpfs/home4/casparl/EESSI/test-suite/eessi/testsuite/tests/apps/osu.py", line 151, in skip_test_1gpu
    num_gpus = self.num_gpus_per_node * self.num_nodes
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Now, it makes total sense to use

compute_unit = COMPUTE_UNIT[NODE]

and set

    @run_after('init')
    def set_num_tasks_per_compute_unit(self):
        """ Setting number of tasks per compute unit and cpus per task. This sets num_cpus_per_task
        for 1 node and 2 node options where the request is for full nodes."""
        if SCALES.get(self.scale).get('num_nodes') == 1:
            self.num_tasks_per_compute_unit = 2

in order to make sure the right amount of tasks is launched per node. So, I'm a bit at a loss where we messed this up. Should we somehow have this test set the test.num_gpus_per_node=test.default_num_gpus_per_node manually or something, now that the hook won't do it? Or should we consider this a 'bug' in the hook? I don't think it is a bug in the hook tbh, it had no specific reason to explicitly ask for a num_gpus_per_node, as we didn't define `ALWAYS_REQUEST_GPUS... I have the feeling it's the tests responsibility to say that 'yeah, I'm asking for node as a compute unit BUT I still want to see how many gpus per node I'm expecting to get, and skip if that's less than only 1'.

@casparvl In the version in the main branch, I used to set this in the test itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OSU pt2pt test don't run on node-parts for GPUs.
3 participants