Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip Runtime Dependencies & CRAY AMD Support #314

Draft
wants to merge 44 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
27ada9e
Using Python environments and pip to manage python dependencies for t…
mdavis36 Nov 8, 2024
7b0c7a4
Passing VIRTUAL_ENV as an argument to PYB11Generator.
mdavis36 Nov 8, 2024
cce4461
Removing commented code; Adding build-requirements.txt; Edit gitignor…
mdavis36 Nov 8, 2024
a6758a3
Removing python packages; Removing netlib-lapack requirement.
mdavis36 Nov 8, 2024
ffad34f
Spheral_python_env takes multiple args for requirements files; Separa…
mdavis36 Nov 8, 2024
7786494
REQUIRE python; Stop pybind11 changing the python interpreter.
mdavis36 Nov 11, 2024
0ea7fde
Environments will utilize system installed packages if available.
mdavis36 Nov 11, 2024
72800a7
Remove hard version constraints; Pip will try to use the latest avail…
mdavis36 Nov 11, 2024
eb30005
Use system pythons on blueos + toss.
mdavis36 Nov 11, 2024
0ab37f3
Cleaning up spheral package.py
mdavis36 Nov 11, 2024
9edfe44
PYB11Generator, fixing broken target dependency.
mdavis36 Nov 11, 2024
d8f21a2
Install pip requirements in spheral-build-env.
mdavis36 Nov 12, 2024
50afffd
Two stage download & install process for pip into SPHERAL_PIP_CACHE_D…
mdavis36 Nov 13, 2024
789987e
blueos pip fixes.
mdavis36 Nov 13, 2024
50289d6
Logical check when SYS_TYPE is not defined in environment.
mdavis36 Nov 13, 2024
4cd4016
$ ENV var
mdavis36 Nov 14, 2024
48893dc
Merge branch 'develop' into feature/pip-runtime-deps
mdavis36 Nov 14, 2024
502cf51
Nees Quotes around possible env var contents.
mdavis36 Nov 15, 2024
6c5514f
Make build & runtime venv targets to perform first time install of pi…
mdavis36 Nov 18, 2024
411c9c1
Using a network test to possibly skip pip download step on air-gapped…
mdavis36 Nov 18, 2024
500bd61
Locking pip version; Updating Dokerfile for pip changes;
mdavis36 Nov 20, 2024
fd8f8e6
Adding ATS submodule; Fulll req file paths.
mdavis36 Nov 20, 2024
70f56cd
Assume network connectivity, unless defined by SPHERAL_NETWORK_CONNEC…
mdavis36 Nov 20, 2024
ee54e3a
Use the ATS submodule to dictate the ATS pip build w/o git control
mdavis36 Nov 22, 2024
a0c50e3
Merge branch 'develop' into feature/pip-runtime-deps
mdavis36 Nov 22, 2024
9c1d3bb
Merge branch 'develop' into feature/pip-runtime-deps
mdavis36 Dec 2, 2024
6b19cda
Use stamp files to stop pip from re-running every single build.
mdavis36 Dec 12, 2024
e98186a
Fixing bad merge...
mdavis36 Dec 12, 2024
1bb48e0
Building on cray systems w/ HIP.
mdavis36 Dec 12, 2024
004872e
spack package fixes for required CRAY / ROCM CMake flags.
mdavis36 Dec 16, 2024
a652fe4
Getting HIP device code running on GPU for spheral_cuda_test; revert …
mdavis36 Dec 17, 2024
6b78495
Merge branch 'develop' into feature/pip-runtime-deps
mdavis36 Dec 17, 2024
4775041
Getting CRAY HIP builds working and passing all tests.
mdavis36 Dec 19, 2024
be8b126
clang 18 warning suppression w/ old boost.
mdavis36 Dec 28, 2024
07ecaee
treat raja and umpire as system includes.
mdavis36 Dec 28, 2024
d544ab0
CRAY pre allocated ats runs.
mdavis36 Jan 2, 2025
aaeadf9
Updating gitlab scripts to run cray & hip jobs.
mdavis36 Jan 3, 2025
e76b591
Adding tioga machine to gitlab ci
mdavis36 Jan 3, 2025
ebb979e
adams -> tioga
mdavis36 Jan 3, 2025
099ea7c
Rename Spheral_CUDA_Test to spheral_offload_test.
mdavis36 Jan 13, 2025
4235970
Bugfix for atomic weight in ANEOS
jmikeowen Jan 15, 2025
5d9324c
Remove debug print
jmikeowen Jan 15, 2025
52eded2
Adding tioga to spheral-ats script
mdavis36 Jan 17, 2025
dbdbd39
Squashing clang18 WasErr issues.
mdavis36 Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ src/PBGWraps/SpheralModules_Silo.C
src/PBGWraps/SpheralModules_Utilities.C
src/PBGWraps/SpheralModules_WildMagic.C

build-*
/build-*
src/*/*cc.2.cc
src/*/*cc.3.cc
src/*/*/*cc.2.cc
Expand Down
44 changes: 22 additions & 22 deletions .gitlab/jobs-mpi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,37 +53,37 @@ toss_clang_mvapich2_cleanup:



blueos_gcc_spectrum_tpls:
extends: [.blueos_resource1, .gcc_spectrum, .tpls]
cray_rocm_mpich_tpls:
extends: [.cray_resource1, .rocm_mpich, .tpls]

blueos_gcc_spectrum_build:
extends: [.blueos_resource1, .gcc_spectrum, .build_and_test]
needs: [blueos_gcc_spectrum_tpls]
cray_rocm_mpich_build:
extends: [.cray_resource1, .rocm_mpich, .build_and_test]
needs: [cray_rocm_mpich_tpls]

blueos_gcc_spectrum_test:
extends: [.blueos_resource1, .gcc_spectrum, .run_ats]
needs: [blueos_gcc_spectrum_build]
cray_rocm_mpich_test:
extends: [.cray_resource1, .rocm_mpich, .run_ats]
needs: [cray_rocm_mpich_build]

blueos_gcc_spectrum_cleanup:
extends: [.blueos_resource1, .gcc_spectrum, .cleanup_dir]
needs: [blueos_gcc_spectrum_test]
cray_rocm_mpich_cleanup:
extends: [.cray_resource1, .rocm_mpich, .cleanup_dir]
needs: [cray_rocm_mpich_test]



blueos_cuda_11_gcc_spectrum_tpls:
extends: [.blueos_resource2, .cuda_11_gcc_spectrum, .tpls]
cray_hip_rocm_mpich_tpls:
extends: [.cray_resource2, .hip_rocm_mpich, .tpls]

blueos_cuda_11_gcc_spectrum_build:
extends: [.blueos_resource2, .cuda_11_gcc_spectrum, .build_and_test]
needs: [blueos_cuda_11_gcc_spectrum_tpls]
cray_hip_rocm_mpich_build:
extends: [.cray_resource2, .hip_rocm_mpich, .build_and_test]
needs: [cray_hip_rocm_mpich_tpls]

blueos_cuda_11_gcc_spectrum_test:
extends: [.blueos_resource2, .cuda_11_gcc_spectrum, .run_ats]
needs: [blueos_cuda_11_gcc_spectrum_build]
cray_hip_rocm_mpich_test:
extends: [.cray_resource2, .hip_rocm_mpich, .run_ats]
needs: [cray_hip_rocm_mpich_build]

blueos_cuda_11_gcc_spectrum_cleanup:
extends: [.blueos_resource2, .cuda_11_gcc_spectrum, .cleanup_dir]
needs: [blueos_cuda_11_gcc_spectrum_test]
cray_hip_rocm_mpich_cleanup:
extends: [.cray_resource2, .hip_rocm_mpich, .cleanup_dir]
needs: [cray_hip_rocm_mpich_test]



8 changes: 4 additions & 4 deletions .gitlab/jobs-prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
toss_update_tpls:
extends: [.toss_resource2, .update_tpls, .merge_pr_rule]

blueos_update_tpls:
extends: [.blueos_resource2, .update_tpls, .merge_pr_rule]
cray_update_tpls:
extends: [.cray_resource2, .update_tpls, .merge_pr_rule]
needs: [toss_update_tpls]

# ------------------------------------------------------------------------------
Expand Down Expand Up @@ -37,6 +37,6 @@ toss_release_permissions:
cleanup_old_dirs_toss:
extends: [.toss_resource_general, .clean_old_dirs]

cleanup_old_dirs_blueos:
extends: [.blueos_resource_general, .clean_old_dirs]
cleanup_old_dirs_cray:
extends: [.cray_resource_general, .clean_old_dirs]

44 changes: 22 additions & 22 deletions .gitlab/jobs-seq.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,33 +17,33 @@ toss_gcc_~mpi_cleanup:
needs: [toss_gcc_~mpi_test]


blueos_cuda_11_gcc_~mpi_tpls:
extends: [.blueos_resource2, .cuda_11_gcc_~mpi, .tpls]
cray_hip_rocm_~mpi_tpls:
extends: [.cray_resource2, .hip_rocm_~mpi, .tpls]

blueos_cuda_11_gcc_~mpi_build:
extends: [.blueos_resource2, .cuda_11_gcc_~mpi, .build_and_test]
needs: [blueos_cuda_11_gcc_~mpi_tpls]
cray_hip_rocm_~mpi_build:
extends: [.cray_resource2, .hip_rocm_~mpi, .build_and_test]
needs: [cray_hip_rocm_~mpi_tpls]

blueos_cuda_11_gcc_~mpi_test:
extends: [.blueos_resource2, .cuda_11_gcc_~mpi, .run_ats]
needs: [blueos_cuda_11_gcc_~mpi_build]
cray_hip_rocm_~mpi_test:
extends: [.cray_resource2, .hip_rocm_~mpi, .run_ats]
needs: [cray_hip_rocm_~mpi_build]

blueos_cuda_11_gcc_~mpi_cleanup:
extends: [.blueos_resource2, .cuda_11_gcc_~mpi, .cleanup_dir]
needs: [blueos_cuda_11_gcc_~mpi_test]
cray_hip_rocm_~mpi_cleanup:
extends: [.cray_resource2, .hip_rocm_~mpi, .cleanup_dir]
needs: [cray_hip_rocm_~mpi_test]


blueos_gcc_~mpi_Debug_tpls:
extends: [.blueos_resource1, .gcc_~mpi_Debug, .tpls]
cray_rocm_~mpi_Debug_tpls:
extends: [.cray_resource1, .rocm_~mpi_Debug, .tpls]

blueos_gcc_~mpi_Debug_build:
extends: [.blueos_resource1, .gcc_~mpi_Debug, .build_and_test]
needs: [blueos_gcc_~mpi_Debug_tpls]
cray_rocm_~mpi_Debug_build:
extends: [.cray_resource1, .rocm_~mpi_Debug, .build_and_test]
needs: [cray_rocm_~mpi_Debug_tpls]

blueos_gcc_~mpi_Debug_test:
extends: [.blueos_resource1, .gcc_~mpi_Debug, .run_ats]
needs: [blueos_gcc_~mpi_Debug_build]
cray_rocm_~mpi_Debug_test:
extends: [.cray_resource1, .rocm_~mpi_Debug, .run_ats]
needs: [cray_rocm_~mpi_Debug_build]

blueos_gcc_~mpi_Debug_cleanup:
extends: [.blueos_resource1, .gcc_~mpi_Debug, .cleanup_dir]
needs: [blueos_gcc_~mpi_Debug_test]
cray_rocm_~mpi_Debug_cleanup:
extends: [.cray_resource1, .rocm_~mpi_Debug, .cleanup_dir]
needs: [cray_rocm_~mpi_Debug_test]
22 changes: 22 additions & 0 deletions .gitlab/machines.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
# ------------------------------------------------------------------------------
# MACHINE TEMPLATES

.on_tioga:
tags:
- tioga
- flux
variables:
SCHEDULER_ACTION: alloc
SCHEDULER_PARAMETERS: "--exclusive -N 2 -t 120"
NPROC: 112
HOSTNAME: 'tioga'
timeout: 120 minutes
extends: [.on_toss_4_x86_cray]

.on_ruby:
tags:
- ruby
Expand Down Expand Up @@ -35,6 +47,9 @@
.blueos_resource_general:
extends: [.on_lassen]

.cray_resource_general:
extends: [.on_tioga]

# ------------------------------------------------------------------------------
#
.toss_resource1:
Expand All @@ -53,3 +68,10 @@
#resource_group: blueos2
extends: [.blueos_resource_general]

.cray_resource1:
#resource_group: cray1
extends: [.cray_resource_general]

.cray_resource2:
#resource_group: cray2
extends: [.cray_resource_general]
9 changes: 9 additions & 0 deletions .gitlab/os.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@
GCC_VERSION: '10.3.1'
CLANG_VERSION: '14.0.6'
SPHERAL_BUILDS_DIR: /p/lustre1/sphapp/spheral-ci-builds
RUN_CMD: 'srun'
extends: [.sys_config]

.on_toss_4_x86_cray:
variables:
ARCH: 'toss_4_x86_64_ib_cray'
ROCMCC_VERSION: '6.2.0'
SPHERAL_BUILDS_DIR: /p/lustre1/sphapp/spheral-ci-builds
RUN_CMD: 'flux run'
extends: [.sys_config]

.on_blueos_3_ppc64:
Expand Down
4 changes: 2 additions & 2 deletions .gitlab/scripts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@

- ml load mpifileutils
- cd $SPHERAL_BUILDS_DIR
- drm $CI_BUILD_DIR/..
- $RUN_CMD -n 20 drm $CI_BUILD_DIR/..

# ------------------------------------------------------------------------------
# Shared TPL scripts.
Expand Down Expand Up @@ -183,7 +183,7 @@
- echo $DIR_LIST

- ml load mpifileutils
- if [[ $DIR_LIST ]]; then drm $DIR_LIST; else echo "No directories to remove at this time."; fi
- if [[ $DIR_LIST ]]; then $RUN_CMD -n 20 drm $DIR_LIST; else echo "No directories to remove at this time."; fi
when: always

.merge_pr_rule:
Expand Down
26 changes: 24 additions & 2 deletions .gitlab/specs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,15 @@
variables:
SPEC: 'gcc@$GCC_VERSION^spectrum-mpi'



.clang_mvapich2:
variables:
SPEC: 'clang@$CLANG_VERSION^mvapich2'
EXTRA_CMAKE_ARGS: '-DENABLE_WARNINGS_AS_ERRORS=On -DENABLE_DEV_BUILD=On'



.cuda_11_gcc_~mpi:
variables:
SPEC: 'gcc@$GCC_VERSION+cuda~mpi cuda_arch=70'
Expand All @@ -39,7 +43,25 @@
SPEC: 'gcc@$GCC_VERSION+cuda cuda_arch=70'
EXTRA_CMAKE_ARGS: '-DENABLE_TIMER=On'

.oneapi_2022_1_mvapich2:


.rocm_mpich:
variables:
SPEC: 'rocmcc@$ROCMCC_VERSION'

.rocm_~mpi:
variables:
SPEC: 'rocmcc@$ROCMCC_VERSION~mpi'

.rocm_~mpi_Debug:
variables:
SPEC: '[email protected]^mvapich2'
SPEC: 'rocmcc@$ROCMCC_VERSION~mpi'
EXTRA_CMAKE_ARGS: '-DCMAKE_BUILD_TYPE=Debug -DENABLE_WARNINGS_AS_ERRORS=On'

.hip_rocm_mpich:
variables:
SPEC: 'rocmcc@$ROCMCC_VERSION+rocm amdgpu_target=gfx942 ^hip@$ROCMCC_VERSION'

.hip_rocm_~mpi:
variables:
SPEC: 'rocmcc@$ROCMCC_VERSION~mpi+rocm amdgpu_target=gfx942 ^hip@$ROCMCC_VERSION'
4 changes: 3 additions & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,6 @@
path = extern/chai
url = https://github.com/llnl/chai
branch = feature/ManagedSharedPtr
ignore = all
[submodule "extern/ATS"]
path = extern/ATS
url = https://github.com/LLNL/ATS
23 changes: 21 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update -y
RUN apt-get upgrade -y
RUN apt-get install -y build-essential git gfortran mpich autotools-dev autoconf sqlite pkg-config uuid gettext cmake libncurses-dev libgdbm-dev libffi-dev libssl-dev libexpat-dev libreadline-dev libbz2-dev locales python python3 unzip libtool wget curl tk-dev
RUN apt-get install -y python3-dev python3-venv python3-pip
RUN apt-get install -y iputils-ping

# Setup system locale for pip package encoding/decoding
RUN locale-gen en_US.UTF-8
Expand All @@ -40,9 +42,23 @@ RUN locale-gen en_US.UTF-8
WORKDIR /home/spheral/workspace/
COPY scripts scripts
COPY .uberenv_config.json .

RUN python3 scripts/devtools/tpl-manager.py --spec $SPEC --spheral-spack-dir /home

COPY . .

# Configure Spheral with SPEC TPLs.
RUN mv *.cmake $HOST_CONFIG.cmake
RUN python3 scripts/devtools/host-config-build.py --host-config $HOST_CONFIG.cmake

# First time install of Spheral pip dependencies
WORKDIR build_$HOST_CONFIG/build
RUN make python_build_env
RUN make python_runtime_env

# Clean workspace once dependencies are installed
WORKDIR /home/spheral/workspace/

RUN rm -rf /home/spheral/workspace/*
# -----------------------------------------------------------------------------

Expand All @@ -58,23 +74,26 @@ ARG HOST_CONFIG=docker-$SPEC
ARG JCXX=8
ARG JPY=1

WORKDIR /home/spheral/workspace/

# Copy Spheral source and generate host config from tpl-manager (all dependencies should already be installed).
COPY . .
RUN python3 scripts/devtools/tpl-manager.py --spec $SPEC --upstream-dir /home/spack/opt/spack/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder__/__spack_path_placeholder_ --spack-url /home/spack

# Configure Spheral with SPEC TPLs.
RUN mv *.cmake $HOST_CONFIG.cmake
RUN python3 scripts/devtools/host-config-build.py --host-config $HOST_CONFIG.cmake
RUN python3 scripts/devtools/host-config-build.py --host-config $HOST_CONFIG.cmake -DSPHERAL_NETWORK_CONNECTED=Off
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation on SPHERAL_NETWORK_CONNECTED


# Build Spheral
WORKDIR build_$HOST_CONFIG/build
RUN make python_build_env
RUN make python_runtime_env
RUN make -j $JCXX Spheral_CXX
RUN make -j $JPY
RUN make install

# Run ATS testing suite.
WORKDIR ../install
ENV MPLBACKEND=agg

# ATS currently does not allow us to run in parallel for regular linux machines
# If it did, we would need some of the following commands
Expand Down
1 change: 1 addition & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ Notable changes include:
* Bugfix for RZ solid CRKSPH with compatible energy.
* Parsing of None string now always becomes None python type. Tests have been updated accordingly.
* IO for checkpoints and visuzalization can now be properly turned off through SpheralController input options.
* Bugfix for atomicWeight in ANEOS

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add RELEASE_NOTES Documentation

Version v2024.06.1 -- Release date 2024-07-09
==============================================
Expand Down
3 changes: 3 additions & 0 deletions cmake/Compilers.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ set(CXX_WARNING_FLAGS "")
if (ENABLE_WARNINGS)
if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
list(APPEND CXX_WARNING_FLAGS -fdiagnostics-show-option -Wno-unused-command-line-argument -Wno-c++17-extensions)
if(CMAKE_CXX_COMPILER_VERSION GREATER_EQUAL 18.0.0)
list(APPEND CXX_WARNING_FLAGS -Wno-enum-constexpr-conversion -Wno-deprecated-declarations)
endif()
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -w")
Expand Down
Loading