From fa4342bfebe7632f359d21d4ffe507403d1b52f7 Mon Sep 17 00:00:00 2001 From: gspetro-NOAA Date: Thu, 25 Jul 2024 19:01:01 -0400 Subject: [PATCH] update container chapter --- .../BuildingRunningTesting/Container.rst | 113 ++++++++++++++++-- .../BuildingRunningTesting/TestingLandDA.rst | 6 +- 2 files changed, 105 insertions(+), 14 deletions(-) diff --git a/doc/source/BuildingRunningTesting/Container.rst b/doc/source/BuildingRunningTesting/Container.rst index 160c2d47..a7428073 100644 --- a/doc/source/BuildingRunningTesting/Container.rst +++ b/doc/source/BuildingRunningTesting/Container.rst @@ -4,9 +4,9 @@ Containerized Land DA Workflow ********************************** -These instructions will help users build and run a basic case for the Unified Forecast System (:term:`UFS`) Land Data Assimilation (DA) System using a `Singularity/Apptainer `_ container. The Land DA :term:`container` packages together the Land DA System with its dependencies (e.g., :term:`spack-stack`, :term:`JEDI`) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth systems models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, :term:`MPIs `, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported :ref:`Level 1 ` system (i.e., Hera, Orion). +These instructions will help users build and run a basic case for the Unified Forecast System (:term:`UFS`) Land Data Assimilation (DA) System using a `Singularity/Apptainer `_ container. The Land DA :term:`container` packages together the Land DA System with its dependencies (e.g., :term:`spack-stack`, :term:`JEDI`) and provides a uniform environment in which to build and run the Land DA System. Normally, the details of building and running Earth systems models will vary based on the computing platform because there are many possible combinations of operating systems, compilers, :term:`MPIs `, and package versions available. Installation via Singularity/Apptainer container reduces this variability and allows for a smoother experience building and running Land DA. This approach is recommended for users not running Land DA on a supported :ref:`Level 1 ` system (i.e., Hera, Orion). -This chapter provides instructions for building and running basic Land DA cases for the Unified Forecast System (:term:`UFS`) Land DA System. Users can choose between two options: +This chapter provides instructions for building and running basic Land DA cases in a container. Users can choose between two options: * A Jan. 3-4, 2000 00z sample case using :term:`GSWP3` data with the UFS Noah-MP land component * A Dec. 21-22, 2019 00z sample case using :term:`ERA5` data with the UFS Land Driver @@ -22,8 +22,8 @@ Prerequisites The containerized version of Land DA requires: - * `Installation of Apptainer `__ - * At least 7 CPU cores + * `Installation of Apptainer `_ + * At least 6 CPU cores * An **Intel** compiler and :term:`MPI` (available for `free here `_) @@ -219,7 +219,7 @@ Users may convert a container ``.img`` file to a writable sandbox. This step is singularity build --sandbox ubuntu20.04-intel-landda-release-public-v1.2.0 $img -When making a writable sandbox on NOAA RDHPCS systems, the following warnings commonly appear and can be ignored: +When making a writable sandbox on NOAA :term:`RDHPCS`, the following warnings commonly appear and can be ignored: .. code-block:: console @@ -240,7 +240,7 @@ There should now be a ``land-DA_workflow`` directory in the ``$LANDDAROOT`` dire singularity exec -B /:/ $img cp -r /opt/land-DA_workflow . -where ```` and ```` are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another ``-B /:/`` argument before the container location (``$img``). +where ```` and ```` are replaced with a top-level directory on the local system and in the container, respectively. Additional directories can be bound by adding another ``-B /:/`` argument before the container location (``$img``). Note that if previous steps included a ``sudo`` command, ``sudo`` may be required in front of this command. .. attention:: @@ -288,11 +288,13 @@ The remaining Level 1 systems that do not have Intel MPI available will need to | Hercules | module load intel-oneapi-compilers/2022.2.1 intel-oneapi-mpi/2021.7.1 | +-----------------+-------------------------------------------------------------------------+ -For Derecho and Gaea, an additional script is needed to help set up the land-DA workflow scripts so that the container can run there. +For Derecho and Gaea, an additional script is needed to help set up the ``land-DA_workflow`` scripts so that the container can run there. .. code-block:: console - ./setup_container.sh -p= + ./setup_container.sh -p= + +where ```` is ``derecho`` or ``gaea``. .. _ConfigureExptC: @@ -323,7 +325,7 @@ The Land DA System uses a script-based workflow that is launched using the ``do_ .. attention:: - Note that the GSWP3 option will only run as-is on Hera and Orion. Users on other systems may need to make significant changes to configuration files, which is not a supported option for the |latestr| release. It is recommended that users on these systems use the UFS land driver ERA5 sample experiment set in ``settings_DA_cycle_era5``. + Note that the GSWP3 option will only run as-is on Hera and Orion. Users on other systems may need to make significant changes to configuration files, which is not a supported option for the |latestr| release. It is recommended that users on other systems use the UFS land driver ERA5 sample experiment set in ``settings_DA_cycle_era5``. First, update the ``$BASELINE`` environment variable in the selected ``settings_DA_*`` file to say ``singularity.internal`` instead of ``hera.internal``: @@ -344,7 +346,49 @@ To start the experiment, run: ./do_submit_cycle.sh settings_DA_cycle_era5 -The ``do_submit_cycle.sh`` script will read the ``settings_DA_cycle_*`` file and the ``release.environment`` file, which contain sensible experiment default values to simplify the process of running the workflow for the first time. Advanced users will wish to modify the parameters in ``do_submit_cycle.sh`` to fit their particular needs. After reading the defaults and other variables from the settings files, ``do_submit_cycle.sh`` creates a working directory (named ``workdir`` by default) and an output directory called ``landda_expts`` in the parent directory of ``land-DA_workflow`` and then submits a job (``submit_cycle.sh``) to the queue that will run through the workflow. If all succeeds, users will see ``log`` and ``err`` files created in ``land-DA_workflow`` along with a ``cycle.log`` file, which will show where the cycle has ended. The ``landda_expts`` directory will also be populated with data in the following directories: +The ``do_submit_cycle.sh`` script will read the ``settings_DA_cycle_*`` file and the ``release.environment`` file, which contain sensible experiment default values to simplify the process of running the workflow for the first time. Advanced users will wish to modify the parameters in ``do_submit_cycle.sh`` to fit their particular needs. After reading the defaults and other variables from the settings files, ``do_submit_cycle.sh`` creates a working directory (named ``workdir`` by default) and an output directory called ``landda_expts`` in the parent directory of ``land-DA_workflow`` and then submits a job (``submit_cycle.sh``) to the queue that will run through the workflow. If all succeeds, users will see ``log`` and ``err`` files created in ``land-DA_workflow`` along with a ``cycle.log`` file, which will show where the cycle has ended. + + + +Check Progress +---------------- + +To check on the experiment status, users on a system with a Slurm job scheduler may run: + +.. code-block:: console + + squeue -u $USER + +To view progress, users can open the ``log*`` and ``err*`` files once they have been generated: + +.. code-block:: console + + tail -f log* err* + +Users will need to type ``Ctrl+C`` to exit the files. For examples of what the log and error files should look like in a successful experiment, reference :ref:`ERA5 Experiment Logs ` or :ref:`GSWP3 Experiment Logs ` below. + +.. attention:: + + If the log file contains a NetCDF error (e.g., ``ModuleNotFoundError: No module named 'netCDF4'``), run: + + .. code-block:: console + + python -m pip install netCDF4 + + Then, resubmit the job (``sbatch submit_cycle.sh``). + +Next, check for the background and analysis files in the test directory. + +.. code-block:: console + + ls -l ../landda_expts/DA__test/mem000/restarts/`` + +where: + + * ```` is either ``era5`` or ``gswp3``, and + * ```` is either ``vector`` or ``tile`` depending on whether ERA5 or GSWP3 forcing data was used, respectively. + +The experiment should populate the ``landda_expts`` directory with data in the following locations: .. code-block:: console @@ -354,4 +398,51 @@ The ``do_submit_cycle.sh`` script will read the ``settings_DA_cycle_*`` file and Depending on the experiment, either the ``vector`` or the ``tile`` directory will have data, but not both. -Users can check experiment progress/success according to the instructions in :numref:`Section %s `, which apply to both containerized and non-containerized versions of the Land DA System. + +.. _era5-log-output: + +ERA5 Experiment Logs +===================== + +For the ERA5 experiment, the ``log*`` file for a successful experiment will a message like: + +.. code-block:: console + + Creating: .//ufs_land_restart.2019-12-22_00-00-00.nc + Searching for forcing at time: 2019-12-22 01:00:00 + +The ``err*`` file for a successful experiment will end with something similar to: + +.. code-block:: console + + + THISDATE=2019122200 + + date_count=1 + + '[' 1 -lt 1 ']' + + '[' 2019122200 -lt 2019122200 ']' + +.. _gswp3-log-output: + +GSWP3 Experiment Logs +======================= + +For the GSWP3 experiment, the ``log*`` file for a successful experiment will end with a list of resource statistics. For example: + +.. code-block:: console + + Number of times filesystem performed OUTPUT = 250544 + Number of Voluntary Context Switches = 3252 + Number of InVoluntary Context Switches = 183 + *****************END OF RESOURCE STATISTICS************************* + +The ``err*`` file for a successful experiment will end with something similar to: + +.. code-block:: console + + + echo 'do_landDA: calling apply snow increment' + + [[ '' =~ hera\.internal ]] + + /apps/intel-2022.1.2/intel-2022.1.2/mpi/2021.5.1/bin/mpiexec -n 6 /path/to/land-DA_workflow/build/bin/apply_incr.exe /path/to/landda_expts/DA_GSWP3_test/DA/logs//apply_incr.log + + [[ 0 != 0 ]] + + '[' YES == YES ']' + + '[' YES == YES ']' + + cp /path/to/workdir/mem000/jedi/20000103.000000.xainc.sfc_data.tile1.nc /path/to/workdir/mem000/jedi/20000103.000000.xainc.sfc_data.tile2.nc /path/to/workdir/mem000/jedi/20000103.000000.xainc.sfc_data.tile3.nc /path/to/workdir/mem000/jedi/20000103.000000.xainc.sfc_data.tile4.nc /path/to/workdir/mem000/jedi/20000103.000000.xainc.sfc_data.tile5.nc /path/to/workdir/mem000/jedi/20000103.000000.xainc.sfc_data.tile6.nc /path/to/landda_expts/DA_GSWP3_test/DA/jedi_incr/ + + [[ YES == \N\O ]] diff --git a/doc/source/BuildingRunningTesting/TestingLandDA.rst b/doc/source/BuildingRunningTesting/TestingLandDA.rst index eda3b143..d1df8fc0 100644 --- a/doc/source/BuildingRunningTesting/TestingLandDA.rst +++ b/doc/source/BuildingRunningTesting/TestingLandDA.rst @@ -53,7 +53,7 @@ If the tests are successful, a message will be printed to the console. For examp Tests ******* -The ERA5 CTests test the operability of six major elements of the Land DA System: ``vector2tile``, ``create_ens``, ``letkfoi_snowda``, ``apply_jediincr``, ``tile2vector``, and ``ufs_datm_land``. The tests and their dependencies are listed in the ``land-DA_workflow/test/CMakeLists.txt`` file. Currently, the CTests are only run on Hera and Orion; they cannot yet be run via container. +The CTests test the operability of six major elements of the Land DA System: ``vector2tile``, ``create_ens``, ``letkfoi_snowda``, ``apply_jediincr``, ``tile2vector``, and ``ufs_datm_land``. The tests and their dependencies are listed in the ``land-DA_workflow/test/CMakeLists.txt`` file. Currently, the CTests are only run on Hera and Orion; they cannot yet be run via container. .. list-table:: *Land DA CTests* :widths: 20 50 @@ -62,11 +62,11 @@ The ERA5 CTests test the operability of six major elements of the Land DA System * - Test - Description * - ``test_vector2tile`` - - Tests the vector-to-tile function for use in JEDI + - Tests the vector-to-tile function for use in JEDI. * - ``test_create_ens`` - Tests creation of a pseudo-ensemble for use in LETKF-OI. * - ``test_letkfoi_snowda`` - - Tests the use of LETKF-OI to assimilate snow DA. + - Tests the use of LETKF-OI to assimilate snow data. * - ``test_apply_jediincr`` - Tests the ability to add a JEDI increment. * - ``test_tile2vector``