-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
job submissions to wrong partition on Hortense #68
Comments
@vkarak Thoughts on this? Is it intentional that having |
No, it's not intentional, we haven't thought about this type of set up, actually :-) But since this is really related to how the target system is set up, I think that setting the |
@vkarak Aren't the We just tried something else together with Lara: we unloaded one of their sticky environment modules (the one that sets the The only thing is, I don't know what unintended side effects the unload of that sticky module might have, but @boegel probably knows... :) |
Indeed, you are right: Currently, in reframe, there is no way to pass options directly to the
Have you also tried the |
@vkarak Not being able to add options directly to the Is there a way to do that? Unloading the module (with force) that sets (it's actually the same question twice, sort of...) |
I had a look again into it and indeed you can't cleanly modify the environment where |
reframe-hpc/reframe#2970 was closed just now, so what we need is coming in a ReFrame release soon? |
Yes, it is already merged in |
Just tested it and it is now submitting to the right partition on hortense. Thanks. |
@vkarak I'm testing with version 4.7.2. The jobs are now submitted to the right partition but with autodetection (when
|
@laraPPr Was this regression in 4.7.2 only? Does it work in 4.7.0? |
Could you also try 4.7.1? Because the only chance to have broken is by the fixes introduced in 4.7.2. Also, could you describe the exact scenario you are trying and what would be the expected behaviour? |
as Caspar found out, you have a typo, should be |
The problem was indeed the typo tested with 4.7.0, 4.7.1 and 4.72. |
Incomming PR to fix it |
@casparvl can this one be closed (Since it is resolved when using ReFrame 4.7.0) or should we wait until the CI is updated to use ReFrame 4.7.0 or older? |
Reframe submits all the tests to the same partition. So if reframe is started from the
cpu_milan
partitions all the test that are found for other partitions will also be submitted tocpu_milan
. This especially goes horribly wrong when starting from a GPU partition. Since all the tests meant for the cpu-partitions fail mediately.We have narrowed down the problem to the following parts of the hortense system and the
vsc_hortense.py
config file:#SBATCH --partition=cpu_milan
in the job script (rfm_job.sh
)SBATCH_PARTITION=cpu_rome
, set by Hortense cluster modulesbatch rfm_job.sh
SBATCH_PARTITION
variable winsCould it be possible that reframe submits the job with
sbatch --partition=cpu_milan rfm.job
?A possible work around might be to use
prepare_cmds
to set the environment variableSBATCH_PARTITION
for every partition inconfig/vsc_hortense.py
.The text was updated successfully, but these errors were encountered: