Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: flexible mem behavior #6

Merged
merged 4 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 34 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,16 @@ Usually, it is advisable to persist such settings via a
[configuration profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles), which
can be provided system-wide, per user, and in addition per workflow.

This is an example of the relevant profile settings:

```yaml
jobs: '<max concurrent jobs>'
executor: lsf
default-resources:
- 'lsf_project=<your LSF project>'
- 'lsf_queue=<your LSF queue>'
```

## Ordinary SMP jobs

Most jobs will be carried out by programs which are either single core
Expand All @@ -52,7 +62,7 @@ to specify them for every rule. Snakemake already has reasonable
defaults built in, which are automatically activated when using any non-local executor
(hence also with lsf). Use mem_mb_per_cpu to give the standard LSF type memory per CPU

## MPI jobs {#cluster-slurm-mpi}
## MPI jobs {#cluster-lsf-mpi}

Snakemake\'s LSF backend also supports MPI jobs, see
`snakefiles-mpi`{.interpreted-text role="ref"} for details.
Expand All @@ -79,18 +89,21 @@ $ snakemake --set-resources calc_pi:mpi="mpiexec" ...

A workflow rule may support a number of
[resource specifications](https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources).
For a LSF cluster, a mapping between Snakemake and SLURM needs to be performed.
For a LSF cluster, a mapping between Snakemake and LSF needs to be performed.

You can use the following specifications:

| LSF | Snakemake | Description |
|----------------|------------|---------------------------------------|
| `-q` | `lsf_queue` | the queue a rule/job is to use |
| `--W` | `walltime` | the walltime per job in minutes |
| `--constraint` | `constraint` | may hold features on some clusters |
| `-R "rusage[mem=<memory_amount>]"` | `mem`, `mem_mb` | memory a cluster node must |
| | | provide (`mem`: string with unit), `mem_mb`: i |
| `-R "rusage[mem=<memory_amount>]"` | `mem_mb_per_cpu` | memory per reserved CPU |
| LSF | Snakemake | Description |
|------------------------------------|------------------|----------------------------------------|
| `-q` | `lsf_queue` | the queue a rule/job is to use |
| `--W` | `walltime` | the walltime per job in minutes |
| `--constraint` | `constraint` | may hold features on some clusters |
| `-R "rusage[mem=<memory_amount>]"` | `mem`, `mem_mb` | memory a cluster node must provide |
| | | (`mem`: string with unit, `mem_mb`: i) |
| `-R "rusage[mem=<memory_amount>]"` | `mem_mb_per_cpu` | memory per reserved CPU |
| omit `-R span[hosts=1]` | `mpi` | Allow splitting across nodes for MPI |
| `-R span[ptile=<ptile>]` | `ptile` | Processors per host. Reqires `mpi` |
| Other `bsub` arguments | `lsf_extra` | Other args to pass to `bsub` (str) |


Each of these can be part of a rule, e.g.:
Expand Down Expand Up @@ -128,4 +141,14 @@ rule myrule:
lsf_extra="-R a100 -gpu num=2"
```

Again, rather use a [profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles) to specify such resources.
Again, rather use a [profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles) to specify such resources.

## Clusters that use per-job memory requests instead of per-core

By default, this plugin converts the specified memory request into the per-core request expected by most LSF clusters.
So `threads: 4` and `mem_mb=128` will result in `-R rusage[mem=32]`. If the request should be per-job on your cluster
(i.e. `-R rusage[mem=<mem_mb>]`) then set the environment variable `SNAKEMAKE_LSF_MEMFMT` to `perjob`.

The executor automatically detects the request unit from cluster configuration, so if your cluster does not use MB,
you do not need to do anything.

45 changes: 34 additions & 11 deletions docs/further.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,16 @@ Usually, it is advisable to persist such settings via a
[configuration profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles), which
can be provided system-wide, per user, and in addition per workflow.

This is an example of the relevant profile settings:

```yaml
jobs: '<max concurrent jobs>'
executor: lsf
default-resources:
- 'lsf_project=<your LSF project>'
- 'lsf_queue=<your LSF queue>'
```

## Ordinary SMP jobs

Most jobs will be carried out by programs which are either single core
Expand All @@ -47,7 +57,7 @@ to specify them for every rule. Snakemake already has reasonable
defaults built in, which are automatically activated when using any non-local executor
(hence also with lsf). Use mem_mb_per_cpu to give the standard LSF type memory per CPU

## MPI jobs {#cluster-slurm-mpi}
## MPI jobs {#cluster-lsf-mpi}

Snakemake\'s LSF backend also supports MPI jobs, see
`snakefiles-mpi`{.interpreted-text role="ref"} for details.
Expand All @@ -74,18 +84,21 @@ $ snakemake --set-resources calc_pi:mpi="mpiexec" ...

A workflow rule may support a number of
[resource specifications](https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html#resources).
For a LSF cluster, a mapping between Snakemake and SLURM needs to be performed.
For a LSF cluster, a mapping between Snakemake and LSF needs to be performed.

You can use the following specifications:

| LSF | Snakemake | Description |
|----------------|------------|---------------------------------------|
| `-q` | `lsf_queue` | the queue a rule/job is to use |
| `--W` | `walltime` | the walltime per job in minutes |
| `--constraint` | `constraint` | may hold features on some clusters |
| `-R "rusage[mem=<memory_amount>]"` | `mem`, `mem_mb` | memory a cluster node must |
| | | provide (`mem`: string with unit), `mem_mb`: i |
| `-R "rusage[mem=<memory_amount>]"` | `mem_mb_per_cpu` | memory per reserved CPU |
| LSF | Snakemake | Description |
|------------------------------------|------------------|----------------------------------------|
| `-q` | `lsf_queue` | the queue a rule/job is to use |
| `--W` | `walltime` | the walltime per job in minutes |
| `--constraint` | `constraint` | may hold features on some clusters |
| `-R "rusage[mem=<memory_amount>]"` | `mem`, `mem_mb` | memory a cluster node must provide |
| | | (`mem`: string with unit, `mem_mb`: i) |
| `-R "rusage[mem=<memory_amount>]"` | `mem_mb_per_cpu` | memory per reserved CPU |
| omit `-R span[hosts=1]` | `mpi` | Allow splitting across nodes for MPI |
| `-R span[ptile=<ptile>]` | `ptile` | Processors per host. Reqires `mpi` |
| Other `bsub` arguments | `lsf_extra` | Other args to pass to `bsub` (str) |


Each of these can be part of a rule, e.g.:
Expand Down Expand Up @@ -123,4 +136,14 @@ rule myrule:
lsf_extra="-R a100 -gpu num=2"
```

Again, rather use a [profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles) to specify such resources.
Again, rather use a [profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles) to specify such resources.

## Clusters that use per-job memory requests instead of per-core

By default, this plugin converts the specified memory request into the per-core request expected by most LSF clusters.
So `threads: 4` and `mem_mb=128` will result in `-R rusage[mem=32]`. If the request should be per-job on your cluster
(i.e. `-R rusage[mem=<mem_mb>]`) then set the environment variable `SNAKEMAKE_LSF_MEMFMT` to `perjob`.

The executor automatically detects the request unit from cluster configuration, so if your cluster does not use MB,
you do not need to do anything.

16 changes: 11 additions & 5 deletions snakemake_executor_plugin_lsf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -144,15 +144,17 @@ def run_job(self, job: JobExecutorInterface):
mem_unit = self.lsf_config.get("LSF_UNIT_FOR_LIMITS", "MB")
conv_fct = conv_fcts[mem_unit[0]]
if job.resources.get("mem_mb_per_cpu"):
mem_ = job.resources.mem_mb_per_cpu * conv_fct * cpus_per_task
mem_ = job.resources.mem_mb_per_cpu * conv_fct
elif job.resources.get("mem_mb"):
mem_ = job.resources.mem_mb * conv_fct
mem_ = job.resources.mem_mb * conv_fct / cpus_per_task
else:
self.logger.warning(
"No job memory information ('mem_mb' or 'mem_mb_per_cpu') is given "
"- submitting without. This might or might not work on your cluster."
)
call += f" -R rusage[mem={mem_}/job]"
if self.lsf_config["LSF_MEMFMT"] == "perjob":
mem_ *= cpus_per_task
call += f" -R rusage[mem={mem_}]"

# MPI job
if job.resources.get("mpi", False):
Expand Down Expand Up @@ -528,8 +530,8 @@ def get_lsf_config():
+ "/logdir/lsb.events"
)
lsb_params_file = (
f"{lsf_config['LSF_CONFDIR']}/lsbatch/",
f"{lsf_config['LSF_CLUSTER']}/configdir/lsb.params",
f"{lsf_config['LSF_CONFDIR']}/lsbatch/"
f"{lsf_config['LSF_CLUSTER']}/configdir/lsb.params"
)
with open(lsb_params_file, "r") as file:
for line in file:
Expand All @@ -539,6 +541,10 @@ def get_lsf_config():
lsf_config["DEFAULT_QUEUE"] = value.split("#")[0].strip()
break

lsf_config["LSF_MEMFMT"] = os.environ.get(
"SNAKEMAKE_LSF_MEMFMT", "percpu"
).lower()

return lsf_config


Expand Down
Loading