Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/href MPMD and other fixes #647

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from

Conversation

BinbinZhou-NOAA
Copy link
Contributor

@BinbinZhou-NOAA BinbinZhou-NOAA commented Jan 8, 2025

Note to developers: You must use this PR template!

Description of Changes

  1. The MPMD is applied in both stat and plot generation jobs.
  2. Most of the issues for cam/ens (href) listed in the EVSv2 Fixes and Additions document have been addressed
  3. Some other issues that are not listed in the EVSv2 Fixes and Additions document

Please include a summary of the changes and the related GitHub issue(s). Please also include relevant motivation and context.
The detailed changes are listed as below:
1: Run all jobs in MET/METplus 6.0.0-rc1

  1. Applied MPMD in both stats and plots jobs ( EVSv2 Fixes and Additions No.3)
  2. Changed subprocess.Popen to subprocess.run in EVS v2.0 (general in EVSv2 Fixes and Additions)
    This change also requires ps.communicated()[0] to be changed to ps.stdout to work correctly
  3. Defined $NET prior to $HOMEevs; apply this change across all J-jobs (general in EVSv2 Fixes and Additions)
  4. Removed old concatenating log file in stats job, but keep it in plotting job (general)
  5. Removed LOOP_ORDER, LOG_LINE_FORMAT, etc from conf file (general)
  6. Updated CAPE setting in the conf files (No. 14)
  7. Changed "past" to "last" for the plot file names (in dev, and ecf) (general)
  8. Followed the EVSv2 file naming standards (No. 13)
  9. Separated valid times for snowfall plots job (old one combined all validation times together)
  10. Required actual cpu resources used for number of the jobs. (general)
  11. Reduced the subscribed memory in plotting jobs to match it is actually used. (general)
  12. Fixed unnecessary gather processes in StatAnalysis (No. 6)
  13. Fixed the "empty CAPE data" issue in SPCoutlook job in case of extreme cold weather and extreme small risk spcoutlook areas (listed in EVSv2 Fixes and Additions No. 7)
  14. Updated lead_average.py file to address the empty data set in cape plotting ( No. 2)
  15. Reduced the walltime for the grid2obs stats jobs, from 4hr 20min to 2hr
  16. Replaced mater_metplus.py with run_metplus.py in ush/cam/evs_href_prepare.sh ( No. 8)
  17. Removed all *.txt file in the href's parm directory (No.12)
  18. Deleted file evs_href_spcoutlook_cape.sh from/ush/cam directory ( script is not used)
  19. Added checking input forecast files before running metplus in the precip and grid2obs stat scripts (No. 4)
  20. All scripts have "set -x" (EVSv2 Fixes and Additions No.18)
  21. Changed [email protected] to [email protected] in all jobs ( EVSv2 Fixes and Additions No.11)
  22. Increased memory used in 90days plot jobs for precip, cape, ctc and snowfall (EVSv2 Fixes and Additions No.16)
  23. Added checking file existence before copy netCDF files to href.YYYYMMDD/precip_mean24 (No.17)

Developer Questions and Checklist

  • Is this a high priorty PR? If so, why and is there a date it needs to be merged by? No
  • Do you have any planned upcoming annual leave/PTO? No
  • Are there any changes needed for when the jobs are supposed to run? No
  • [y ] The code changes follow [NCO's EE2 Standards] (https://www.nco.ncep.noaa.gov/idsb/implementation_standards/ImplementationStandards.v11.0.0.pdf).
  • [ y] Developer's name is removed throughout the code and have used ${USER} where necessary throughout the code.
  • [ y] References the feature branch for HOMEevs are removed from the code.
  • [ y] J-Job environment variables, COMIN and COMOUT directories, and output follow what has been defined for EVS.
  • [ y] Jobs over 15 minutes in runtime have restart capability.
  • [ y] If applicable, changes in the dev/drivers/scripts or dev/modulefiles have been made in the corresponding ecf/scripts and ecf/defs/evs-nco.def?
  • [ y] Jobs contain the approriate file checking and don't run METplus for any missing data.
  • [ y] Code is using METplus wrappers structure and not calling MET executables directly.
  • [ y] Log is free of any ERRORs or WARNINGs.

Testing Instructions

Please include testing instructions for the PR assignee. Include all relevant input datasets needed to run the tests.
Since many scripts in bot stats and plots jobs have been updated, bot stats and plots jobs needed to test:\

  1. For the stats, in the dev script directory, run following 3 jobs
    jevs_cam_href_grid2obs_stats.sh
    jevs_cam_href_precip_stats.sh
    jevs_cam_href_spcoutlook_stats.sh

  2. For the plots, run following 31day and 90 day jobs
    Before testing, set COMIN=/lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0

    jevs_cam_href_grid2obs_cape_last31days_plots.sh
    jevs_cam_href_grid2obs_cape_last90days_plots.sh
    jevs_cam_href_grid2obs_ctc_last31days_plots.sh
    jevs_cam_href_grid2obs_ctc_last90days_plots.sh
    jevs_cam_href_grid2obs_ecnt_last31days_plots.sh
    jevs_cam_href_grid2obs_ecnt_last90days_plots.sh
    jevs_cam_href_precip_last31days_plots.sh
    jevs_cam_href_precip_last90days_plots.sh
    jevs_cam_href_precip_spatial_plots.sh
    jevs_cam_href_profile_last31days_plots.sh
    jevs_cam_href_profile_last90days_plots.sh
    jevs_cam_href_snowfall_last31days_plots.sh
    jevs_cam_href_snowfall_last90days_plots.sh
    jevs_cam_href_spcoutlook_last31days_plots.sh
    jevs_cam_href_spcoutlook_last90days_plots.sh

@malloryprow
Copy link
Contributor

Lots of great updates here @BinbinZhou-NOAA! I'll start reviewing and testing on Friday.

@malloryprow
Copy link
Contributor

@BinbinZhou-NOAA I noticed in the HREF METplus config files that there are settings that start with POINT_STAT_NC_PAIRS_FLAG_. This is not a real METplus setting (search METplus Configuration Glossary). Can these be removed?

@malloryprow
Copy link
Contributor

malloryprow commented Jan 10, 2025

One thing I noticed in the plots ex-scripts is the to the job scripts is writing for run_all_poe.sh, it checks for the file existing before copying to restart, which is good. However, restart is in COMOUT and there is no check that SENDCOM = YES.

Example: scripts/plots/cam/exevs_href_spcoutlook_plots.sh
Line 18: restart=$COMOUT/restart/$last_days/href_spcoutlook_plots
Lines 175-178: Checking for images, copy to restart, but no check for SENDCOM = YES

I also see that the code does cd $restart and then renames files, so it is working from COMOUT to rename the files. Can the files be renamed in DATA and then copied to COMOUT?

Example: scripts/plots/cam/exevs_href_spcoutlook_plots.sh
Line 213: cd $restart
Line 227: Renaming files in COMOUT

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 10, 2025 via email

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 10, 2025 via email

@malloryprow
Copy link
Contributor

The suggestions are for the ecf scripts to match the dev driver scripts.

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 10, 2025 via email

@malloryprow
Copy link
Contributor

For some ush scripts:

  • ush/cam/evs_href_grid2obs_product.sh
  • ush/cam/evs_href_grid2obs_profile.sh
  • ush/cam/evs_href_grid2obs_system.sh
  • ush/cam/evs_href_precip.sh
  • ush/cam/evs_href_spcoutlook.sh
  • ush/cam/evs_href_prepare.sh
  • ush/cam/evs_href_snowfall.sh
  1. Change of previous to precious; change it back to previous
    Example: ush/cam/evs_href_grid2obs_product.sh, line 44

  2. Add checks for SENDCOM=YES before copying to COMOUT
    Example: ush/cam/evs_href_grid2obs_product.sh, line 114

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 10, 2025 via email

@malloryprow
Copy link
Contributor

@BinbinZhou-NOAA I finished my initial review. It may be easiest to see and review everything looking at PR conversionsation on a web browser. (#647)

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 10, 2025 via email

@@ -244,16 +254,16 @@ chmod +x run_all_poe.sh
# Run the POE script in parallel or in sequence order to generate png files
#**************************************************************************
if [ $run_mpi = yes ] ; then
mpiexec -np 60 -depth 1 --cpu-bind verbose,depth cfp ${DATA}/run_all_poe.sh
mpiexec -np 60 -depth 1 --cpu-bind verbose,depth cfp ${DATA}/scripts/run_all_poe.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be mpiexec np 60 -ppn 60

@malloryprow
Copy link
Contributor

The new changes are looking good! A few more things (a few were noted previously but not sure if they are still being worked on so sorry if that is the case!)...

  1. In the scripts being created to run in run_all_poe.sh, after echo "#!/bin/ksh" >> [script name], can there be a echo "set -x" >> [script name] be added?
  2. I believe the *.completed files for restart are being written directly to COMOUT. They need to be written to DATA first and then copied to COMOUT (check SENDCOM=YES).
  3. Change of previous to precious; change it back to previous
    Example: ush/cam/evs_href_grid2obs_product.sh, line 44
  4. Add checks for SENDCOM=YES before copying to COMOUT for stats
    Example: ush/cam/evs_href_grid2obs_product.sh, line 115; ush/cam/evs_href_grid2obs_profile.sh, line 478

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 13, 2025 via email

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Jan 15, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cam - ens: Address Bugzilla 1547 - MPMD processes share the same working directory
2 participants