Iss25 #43

kasra-keshavarz · 2024-03-05T23:38:03Z

This Pull Request resolves the following issues: #40, #37 (partially), #36 (partially), #34, #27, #25.

The commits all have comprehensive messages describing the change the more importantly, the reason.

I do NOT like that GitHub does not provide character limitations for each line here.

This script introduces new features to the tool, including the capability to process the climate datasets, including those consisting of multiple models, submodels (those with specific configuration sets), ensemble members, and multiple scenarios (SSPs). The parent calling script is in charge of parallelization scheme, if needed. With this script, a few issues related to the current deficiencies of datatool could be resolved simultaneously. Signed-off-by: Kasra Keshavarz <[email protected]>

Multi parallelization schemes are added, so the package not only submit array jobs based on the given date range and the chunk schemes, but also considers submitting jobs based on various models, ensemble members, and scenarios. These new parallelization schemes mostly applies to climate datasets, but not necessarily. This commit aims to save time for the user and fasten the processing time for datasets. This commit resolves issue #25 on remote GitHub hosting repository. Furthermore, it adds the ESPO dataset to the list of datasets as well. Moreover, a new option is implement to show the list of currently available datasets to the users. Signed-off-by: Kasra Keshavarz <[email protected]>

This is meant to clearly organize the information provided inside the package. The new file lists all the available datasets and the keyword that users can provide the `--dataset` option. Previously, this information was part of the main Usage message `--help` of the main script. Signed-off-by: Kasra Keshavarz <[email protected]>

1. the "function" keywords added to make the style compatible with that of Google's recommendations, 2. required arguments and options are revised alongside the relevant comments, 3. typos are fixed Signed-off-by: Kasra Keshavarz <[email protected]>

The script deals with the Climate Dataset produced by the Alberta Government. The dataset is not public yet, and is planned to be available soon. Signed-off-by: Kasra Keshavarz <[email protected]>

Since some hydrological models can use near-surface level or 40m level data, the necessary list of variables for both levels are added. Furthermore, a link to the official website for the dataset is added for further clarity. Signed-off-by: Kasra Keshavarz <[email protected]>

Since multiple HPCs are now used for the workflows, it is important to have consistent datasets synchronized regularly. Therefore, this commit attempts to reflect these efforts by creating consistent paths for various HPCs/allocations. Signed-off-by: Kasra Keshavarz <[email protected]>

In this commit, the following are addressed: * Correcting paths for the local scripts, * Renaming scripts to reflect the owner of the script for further clarification, * Adding parallelization schemes based on model, ensemble, and scenario, * Adding gcc/9.3.0 as the reference clib for the modules loaded to prevent mismatch between various environments defined on the HPCs, * Assuring ESPG:4326 is considered for the input shape file if there is no CRS defined, * Getting rid of \t characters in the help messages, * Correcting short help message to be more informative, * Adding function declarations to follow Google’s shell scripting guidelines, * Assuring --account=STR is described in the help message. Signed-off-by: Kasra Keshavarz <[email protected]>

…delines

Various files within this directory is categorized to be more informative for the users/devs. Signed-off-by: Kasra Keshavarz <[email protected]>

The README file for this dataset is added, offering necessary information for the users. Signed-off-by: Kasra Keshavarz <[email protected]>

This commit assures all dataset scripts follows the convention of <institute>-<dataset-name> under the `scripts` path. Furthermore, necessary adjusments on the styles of the scripts has been implemented, including: * adding `--model`, `--scenario`, and `--ensemble` options, if missing, for compatibility with the main caller script, as these options are given to the script by `extract-dataset.sh` script, * assuring scripting style follows that of Google's shell scripting guidelines, * the paths to the externally called scripts are properlly adjusted, after modifications to the structure of datatool's `assets` directory, and * minor changes to the source code to assure compatibility with the v0.5.0 of datatool. Signed-off-by: Kasra Keshavarz <[email protected]>

This commit addresses issue #27 by describing the NASA's NEX-GDDP-CMIP^ dataset and relevant scripts for it. Furthermore, it provides necessary information for users to enable them use `datatool` for extracting subsets of the dataset for any temporal and spatial extents. Signed-off-by: Kasra Keshavarz <[email protected]>

This commit addresses issue #27 and provides scripts to extract subset from NASA's NEX-GDDP-CMIP6 dataset. This script is capable to work with various models, scenarios, ensemble members, and variables offered by this dataset. Signed-off-by: Kasra Keshavarz <[email protected]>

This commit addresses issue #34 and processes this dataset that contains multiple GCM model outputs, including various sub-models, scenarios, ensemble members, and variables. Signed-off-by: Kasra Keshavarz <[email protected]>

Necessary information to use `datatool` for this script is provided to the user via the README.md file. Signed-off-by: Kasra Keshavarz <[email protected]>

With the growing number of scripts, this commit tries to restructure this directory to provide more clarity and organization for the users. Signed-off-by: Kasra Keshavarz <[email protected]>

The help message has been trimmed to provide more information to the users. This include values provided to the `--lon-lims` that must be within the [-180, +180] limits. This has not been mentioned before to the users and could have provided confusion, as there are multiple methods to describe longitudes. Furthermore, the list of datasets on the main page of the repository has been updated to reflect the most up-to-date list. Signed-off-by: Kasra Keshavarz <[email protected]>

Merging

kasra-keshavarz added 30 commits February 23, 2024 10:36

Fixing short usage and comments

d59dd2c

Adding GDDP-NEX-CMIP6 info

4f90faa

Fixing DOI value for ab-gov dataset

1c55810

Adding NASA GDDP-NEX-CMIP6 script address

ed03d80

ESPO-G6-R2 data processing example

58e262a

Multiple minor modifications

a9f3c70

1. the "function" keywords added to make the style compatible with that of Google's recommendations, 2. required arguments and options are revised alongside the relevant comments, 3. typos are fixed Signed-off-by: Kasra Keshavarz <[email protected]>

AB Government Climate Dataset Script

0d35a69

The script deals with the Climate Dataset produced by the Alberta Government. The dataset is not public yet, and is planned to be available soon. Signed-off-by: Kasra Keshavarz <[email protected]>

Bumping version to v0.5.0

74ce455

Assuring compatibility of the style with Google's shell scripting gui…

18f9652

…delines

Organizing the assets directory

c806428

Various files within this directory is categorized to be more informative for the users/devs. Signed-off-by: Kasra Keshavarz <[email protected]>

README file for ab-gov dataset

e03937f

The README file for this dataset is added, offering necessary information for the users. Signed-off-by: Kasra Keshavarz <[email protected]>

Tracking LICENSE of eccc-rdrs

7f0d71e

Tracking eccc-rdrs script

59bc434

Tracking GWF-NCAR CONUS-I script

d7a9a78

Adding Ouranos ESPO-G6-R2 Dataset Script

e983994

This commit addresses issue #34 and processes this dataset that contains multiple GCM model outputs, including various sub-models, scenarios, ensemble members, and variables. Signed-off-by: Kasra Keshavarz <[email protected]>

Documenting Ouranos ESPO-G6-R2 Dataset script

fd6d96d

Necessary information to use `datatool` for this script is provided to the user via the README.md file. Signed-off-by: Kasra Keshavarz <[email protected]>

Updating changelog for v0.5.0

a4c22fc

Adding a section for WIP directories

0946306

Restructuring script directory

fadeae9

With the growing number of scripts, this commit tries to restructure this directory to provide more clarity and organization for the users. Signed-off-by: Kasra Keshavarz <[email protected]>

Upgrading style of warning message

877b24a

kasra-keshavarz added 4 commits March 5, 2024 18:25

Upgrading style of warning message

69a80a3

Updating link addresses for CONUS I & II

b61717f

Updating link address to ERA5 dataset

618cc7d

Removing dead link for the Ouranos MRCC5 dataset for now

0d254fc

kasra-keshavarz added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request added dataset new dataset being added to the script new release new release labels Mar 5, 2024

kasra-keshavarz self-assigned this Mar 5, 2024

Merge branch 'main' into iss25

1a9b535

Merging

kasra-keshavarz merged commit 66140ec into main Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iss25 #43

Iss25 #43

kasra-keshavarz commented Mar 5, 2024

Iss25 #43

Iss25 #43

Conversation

kasra-keshavarz commented Mar 5, 2024