Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AWS config file #53

Merged
merged 19 commits into from
Aug 24, 2023
Merged
Changes from 11 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
cb10f9d
Added AWS config file
Jun 13, 2023
8da9abb
Rename environment, and add mgmt as hostname for the system to recogn…
Jun 13, 2023
bc7ab43
Remove slash from partition names, to avoid error 'ERROR: failed to i…
Jun 13, 2023
fa747a9
Removed spaces from partition names, since these are used to create d…
Jun 14, 2023
cffe271
Make sure vars don't get exported, we need to source the EESSI enviro…
Jun 15, 2023
27c9c90
Changed launcher to srun, to work with CPU autodetect. Also, changed …
Jun 19, 2023
3a531a6
Need to set SLURM_EXPORT_ENV in job to make sure environment gets exp…
Jun 20, 2023
7684290
Changed back to mpirun
Jun 20, 2023
bb6e8ef
somehow this only works if --constraint is followed by an equal sign,…
Jun 20, 2023
191a313
Add explaination on how to make autodetect work
Jun 22, 2023
511ab6e
Add explaination on how to make autodetect work
Jun 22, 2023
9bd26a4
Changed append to True, even if we also already use date stampts for …
Jun 28, 2023
2316d90
Hardcode the processor configs for the graviton nodes for now, since …
Jul 18, 2023
9d4f3dc
Merge branch 'main' into config_aws
Jul 18, 2023
fb02996
Bring logging in line with Vega config. Also, use FEATURES constant
Jul 18, 2023
36a79a0
Corrected two small mistakes in the sytnax
Jul 18, 2023
1f450e2
Merge branch 'main' into config_aws
Jul 18, 2023
67e8cbe
Removed graviton CPU description, as autodetect seems to work now
Aug 4, 2023
a4f777f
Clarified comments on what to do to make CPU autodetection work
Aug 23, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 172 additions & 0 deletions config/aws_citc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# This is an example configuration file

# Note that CPU autodetect currently does not work with this configuration file on AWS
# In order to do CPU autodetection, two changes are needed:
# 1. Remove all '--export=NONE'
casparvl marked this conversation as resolved.
Show resolved Hide resolved
# 2. Set 'launcher = srun'
# You can run the CPU autodetect by listing all tests (reframe -l ...)
# and then, once all CPUs are autodetected, change the config back for a 'real' run (reframe -r ...)

site_configuration = {
'systems': [
{
'name': 'citc',
'descr': 'Cluster in the Cloud build and test environment on AWS',
'modules_system': 'lmod',
'hostnames': ['mgmt', 'login', 'fair-mastodon*'],
'prefix': f'reframe_runs/',
'partitions': [
{
'name': 'x86_64-haswell-8c-15gb',
'access': ['--constraint=shape=c4.2xlarge', '--export=NONE'],
'descr': 'Haswell, 8 cores, 15 GiB',
},
{
'name': 'x86_64-haswell-16c-30gb',
'access': ['--constraint=shape=c4.4xlarge', '--export=NONE'],
'descr': 'Haswell, 16 cores, 30 GiB',
},
{
'name': 'x86_64-zen2-8c-16gb',
'access': ['--constraint=shape=c5a.2xlarge', '--export=NONE'],
'descr': 'Zen2, 8 cores, 16 GiB',
},
{
'name': 'x86_64-zen2-16c-32gb',
'access': ['--constraint=shape=c5a.4xlarge', '--export=NONE'],
'descr': 'Zen2, 16 cores, 32 GiB',
},
{
'name': 'x86_64-zen3-8c-16gb',
'access': ['--constraint=shape=c6a.2xlarge', '--export=NONE'],
'descr': 'Zen3, 8 cores, 16 GiB',
},
{
'name': 'X86_64-zen3-16c-32gb',
'access': ['--constraint=shape=c6a.4xlarge', '--export=NONE'],
'descr': 'Zen3, 16 cores, 32 GiB',
},
{
'name': 'x86_64-skylake-cascadelake-8c-16gb',
'access': ['--constraint=shape=c5.2xlarge', '--export=NONE'],
'descr': 'Skylake/Cascade lake, 8 cores, 16 GiB',
},
{
'name': 'x86_64-skylake-cascadelake-16c-32gb',
'access': ['--constraint=shape=c5.4xlarge', '--export=NONE'],
'descr': 'Skylake/Cascade lake, 16 cores, 32 GiB',
},
{
'name': 'x86_64-skylake-cascadelake-8c-16gb-nvme',
'access': ['--constraint=shape=c5d.2xlarge', '--export=NONE'],
'descr': 'Skylake/Cascade lake, 8 cores, 16 GiB, 200GB NVMe',
},
{
'name': 'x86_64-icelake-8c-16gb',
'access': ['--constraint=shape=c6i.2xlarge', '--export=NONE'],
'descr': 'Icelake, 8 cores, 16 GiB',
},
{
'name': 'aarch64-graviton2-8c-16gb',
'access': ['--constraint=shape=c6g.2xlarge', '--export=NONE'],
'descr': 'Graviton2, 8 cores, 16 GiB',
},
{
'name': 'aarch64-graviton2-16c-32gb',
'access': ['--constraint=shape=c6g.4xlarge', '--export=NONE'],
'descr': 'Graviton2, 16 cores, 32 GiB',
},
{
'name': 'aarch64-graviton2-32c-64gb',
'access': ['--constraint=shape=c6g.8xlarge', '--export=NONE'],
'descr': 'Graviton2, 32 cores, 64 GiB',
},
{
'name': 'aarch64-graviton3-8c-16gb',
'access': ['--constraint=shape=c7g.2xlarge', '--export=NONE'],
'descr': 'Graviton3, 8 cores, 16 GiB',
},
{
'name': 'aarch64-graviton3-8c-32gb',
'access': ['--constraint=shape=c7g.4xlarge', '--export=NONE'],
'descr': 'Graviton3, 16 cores, 32 GiB',
},
]
},
],
'environments': [
{
'name': 'default',
'cc': 'cc',
'cxx': '',
'ftn': '',
},
],
'logging': [
{
'level': 'debug',
'handlers': [
{
'type': 'stream',
'name': 'stdout',
'level': 'info',
'format': '%(message)s'
},
{
'type': 'file',
'prefix': 'reframe_runs',
'name': 'reframe.log',
'level': 'debug',
'format': '[%(asctime)s] %(levelname)s: %(check_info)s: %(message)s', # noqa: E501
'append': False,
casparvl marked this conversation as resolved.
Show resolved Hide resolved
'timestamp': "%Y%m%d_%H%M%S",
},
],
'handlers_perflog': [
{
'type': 'filelog',
'prefix': '%(check_system)s/%(check_partition)s',
'level': 'info',
'format': (
'%(check_job_completion_time)s|reframe %(version)s|'
'%(check_info)s|jobid=%(check_jobid)s|'
'%(check_perf_var)s=%(check_perf_value)s|'
'ref=%(check_perf_ref)s '
'(l=%(check_perf_lower_thres)s, '
'u=%(check_perf_upper_thres)s)|'
'%(check_perf_unit)s'
),
'append': True
}
]
}
],
'general': [
{
'remote_detect': True,
}
],
}

# Add default things to each partition:
partition_defaults = {
'scheduler': 'squeue',
# mpirun causes problems with cpu autodetect, since there is no system mpirun.
# See https://github.com/EESSI/test-suite/pull/53#issuecomment-1590849226
# and this feature request https://github.com/reframe-hpc/reframe/issues/2926
# However, using srun requires either using pmix or proper pmi2 integration in the MPI library
# See https://github.com/EESSI/test-suite/pull/53#issuecomment-1598753968
# Thus, we use mpirun for now, and manually swap to srun if we want to autodetect CPUs...
'launcher': 'mpirun',
casparvl marked this conversation as resolved.
Show resolved Hide resolved
'environs': ['default'],
'features': ['cpu'],
'prepare_cmds': [
'source /cvmfs/pilot.eessi-hpc.org/latest/init/bash',
# Required when using srun as launcher with --export=NONE in partition access, in order to ensure job
# steps inherit environment. It doesn't hurt to define this even if srun is not used
'export SLURM_EXPORT_ENV=ALL'
],
}
for system in site_configuration['systems']:
for partition in system['partitions']:
partition.update(partition_defaults)