Skip to content

Commit

Permalink
Fix srun invocation on Great Lakes. (#779)
Browse files Browse the repository at this point in the history
* Use mpirun on Great Lakes.

* Update test reference data.

* Update reference data again.

* Use srun, but fix environment export and buffered output.

* Update template reference data.

* Temporary fix to Great Lakes scheduling.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test reference data.

* fix: Raise error on non-homogeneous MPI requests

* refactor: Move necessary changes to template.

Due to planned refactoring of directives this makes more sense.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Brandon Butler <[email protected]>
  • Loading branch information
3 people authored Nov 8, 2023
1 parent 9ba71de commit 8477337
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 4 deletions.
6 changes: 5 additions & 1 deletion flow/environments/umich.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@ class GreatLakesEnvironment(DefaultSlurmEnvironment):
_gpus_per_node = {"default": 2}
_shared_partitions = {"standard", "gpu"}

mpi_cmd = "srun"
# For unknown reasons, srun fails to export environment variables such as
# PATH on Great Lakes unless explicitly requested to with --export=ALL.
# On Great Lakes, srun also fails to flush the buffer until the end of
# the job without explicitly setting -u.
mpi_cmd = "srun -u --export=ALL"

@classmethod
def add_args(cls, parser):
Expand Down
8 changes: 5 additions & 3 deletions flow/templates/umich-greatlakes.sh
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
{% extends "slurm.sh" %}
{% set partition = partition|default('standard', true) %}
{% set nranks = (operations|calc_tasks("nranks", parallel, force), 1) | max %}
{% block tasks %}
{% if resources.ngpu_tasks and 'gpu' not in partition and not force %}
{% raise "Requesting GPUs requires a gpu partition!" %}
{% endif %}
{% if 'gpu' in partition and resources.ngpu_tasks == 0 and not force %}
{% raise "Requesting gpu partition without GPUs!" %}
{% endif %}
#SBATCH --nodes={{ resources.num_nodes }}
#SBATCH --ntasks={{ resources.ncpu_tasks }}
#SBATCH --nodes={{ resources.num_nodes }}-{{ resources.num_nodes }}
#SBATCH --ntasks={{ nranks }}
#SBATCH --cpus-per-task={{ resources.ncpu_tasks // nranks}}
{% if partition == 'gpu' %}
#SBATCH --gpus={{ resources.ngpu_tasks }}
#SBATCH --gpus-per-task={{ resources.ngpu_tasks // nranks }}
{% endif %}
{% endblock tasks %}
{% block header %}
Expand Down
Binary file modified tests/template_reference_data.tar.gz
Binary file not shown.

0 comments on commit 8477337

Please sign in to comment.