Skip to content

Checks filtering using valid programs shows only the first partition defined in the configuration #3371

Closed
@victorusu

Description

@victorusu

TL;DR

I am using the valid_systems = [r'%scheduler=slurm', r'%scheduler=squeue'] syntax described in the documentation to select which tests to run. ReFrame only selects the tests that depend on the first partition on the list defined in site_configuration.

ReFrame version

4.8.0-dev.3+cf670fe1 (latest as of today)

Steps to reproduce the error

1. Create system configuration files

Create a configuration file for a given cluster, defining two different partitions. In my case, I have used local and slurm.
Copy the file into a different one and change the order of appearance of the partitions inside the partitions key in the systems list.

I have used the two configuration files below.

File daint-local-partition-first-config.py:

site_configuration = {
    'systems': [
        {
            'name' : 'daint',
            'descr' : 'Piz Daint vCluster',
            'hostnames' : ['daint'],
            'partitions': [
                {
                    'name': 'login',
                    'scheduler': 'local',
                    'time_limit': '10m',
                    'environs': [
                        'builtin',
                    ],
                    'descr': 'Login nodes',
                    'max_jobs': 4,
                    'launcher': 'local'
                },
                {
                    'name': 'normal',
                    'descr': 'GH200',
                    'scheduler': 'slurm',
                    'time_limit': '10m',
                    'environs': [
                        'builtin',
                    ],
                    'max_jobs': 100,
                    'launcher': 'srun',
                },
            ]
        }
    ],
}

File daint-slurm-partition-first-conf.py:

site_configuration = {
    'systems': [
        {
            'name' : 'daint',
            'descr' : 'Piz Daint vCluster',
            'hostnames' : ['daint'],
            'partitions': [
                {
                    'name': 'normal',
                    'descr': 'GH200',
                    'scheduler': 'slurm',
                    'time_limit': '10m',
                    'environs': [
                        'builtin',
                    ],
                    'max_jobs': 100,
                    'launcher': 'srun',
                },
                {
                    'name': 'login',
                    'scheduler': 'local',
                    'time_limit': '10m',
                    'environs': [
                        'builtin',
                    ],
                    'descr': 'Login nodes',
                    'max_jobs': 4,
                    'launcher': 'local'
                },
            ]
        }
    ],
}

2. Define two tests

One test should set valid_systems = [r'%scheduler=slurm'] and the other valid_systems = [r'%scheduler=local'].

I am using the following two tests.

import os

import reframe as rfm
import reframe.utility.sanity as sn

SLEEPCMD='/bin/sleep'


@rfm.simple_test
class sleep_submit_job_check(rfm.RunOnlyRegressionTest):
    executable = SLEEPCMD
    # run only when slurm is the workload manager
    valid_systems = [r'%scheduler=slurm']
    valid_prog_environs = ['builtin']
    executable_opts = ['1']

    @sanity_function
    def assert_sanity(self):
        return True


@rfm.simple_test
class sleep_local_job_check(rfm.RunOnlyRegressionTest):
    executable = SLEEPCMD
    # run in the local scheduler
    valid_systems = [r'%scheduler=local']
    valid_prog_environs = ['builtin']
    executable_opts = ['1']

    @sanity_function
    def assert_sanity(self):
        return sn.all([
            sn.assert_eq(os.stat(sn.evaluate(self.stdout)).st_size, 0,
                         msg=f'file {self.stdout} is not empty'),
            sn.assert_eq(os.stat(sn.evaluate(self.stderr)).st_size, 0,
                         msg=f'file {self.stderr} is not empty'),
            ])

3. The output

When the local partition is defined as the first entry, it selects only the job that sets valid_systems = [r'%scheduler=local'].

$ reframe -C daint-local-partition-first-config.py -c mini-reproducer.py -l
[ReFrame Setup]
  version:           4.8.0-dev.3+cf670fe1
...
[List of matched checks]
- sleep_local_job_check /7370cc85
Found 1 check(s)
...

When the slurm partition is defined as the first entry, it selects only the job that sets valid_systems = [r'%scheduler= slurm'].

$ reframe -C daint-slurm-partition-first-conf.py -c mini-reproducer.py -l
[ReFrame Setup]
  version:           4.8.0-dev.3+cf670fe1
...
[List of matched checks]
- sleep_submit_job_check /4d2777d3
Found 1 check(s)
....

4. The expected output

ReFrame should select both tests independently of the order in the site_configuration variable.

Thus this

$ reframe -C daint-local-partition-first-config.py -c mini-reproducer.py -l
[ReFrame Setup]
  version:           4.8.0-dev.3+cf670fe1
...
[List of matched checks]
- sleep_local_job_check /7370cc85
- sleep_submit_job_check /4d2777d3
Found 2 check(s)
...

should have the same output as below.

$ reframe -C daint-slurm-partition-first-conf.py -c mini-reproducer.py -l
[ReFrame Setup]
  version:           4.8.0-dev.3+cf670fe1
...
[List of matched checks]
- sleep_local_job_check /7370cc85
- sleep_submit_job_check /4d2777d3
Found 2 check(s)
....

Metadata

Metadata

Assignees

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions