Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iss25 #43

Merged
merged 35 commits into from
Mar 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
8559ff7
Ouranos ESPO-G6-R2 script + new capablities
kasra-keshavarz Feb 23, 2024
d59dd2c
Fixing short usage and comments
kasra-keshavarz Feb 23, 2024
5a07e84
Adding new parallelization schemes
kasra-keshavarz Feb 23, 2024
22f9529
Separating dataset information from the main extract-dataset script
kasra-keshavarz Feb 23, 2024
4f90faa
Adding GDDP-NEX-CMIP6 info
kasra-keshavarz Feb 23, 2024
1c55810
Fixing DOI value for ab-gov dataset
kasra-keshavarz Feb 23, 2024
ed03d80
Adding NASA GDDP-NEX-CMIP6 script address
kasra-keshavarz Feb 23, 2024
58e262a
ESPO-G6-R2 data processing example
kasra-keshavarz Feb 23, 2024
a9f3c70
Multiple minor modifications
kasra-keshavarz Feb 29, 2024
0d35a69
AB Government Climate Dataset Script
kasra-keshavarz Feb 29, 2024
defdde2
Adding variable list for various elevation levels
kasra-keshavarz Mar 4, 2024
b3f9728
Path to the dataset for rpp-kshook allocation is updated
kasra-keshavarz Mar 4, 2024
74ce455
Bumping version to v0.5.0
kasra-keshavarz Mar 4, 2024
a1fafaf
Addressing issues #39, #37, #36, #35, #34, and #25
kasra-keshavarz Mar 4, 2024
18f9652
Assuring compatibility of the style with Google's shell scripting gui…
kasra-keshavarz Mar 4, 2024
c806428
Organizing the assets directory
kasra-keshavarz Mar 4, 2024
e03937f
README file for ab-gov dataset
kasra-keshavarz Mar 4, 2024
8a899b0
Minor structural changes
kasra-keshavarz Mar 4, 2024
7f0d71e
Tracking LICENSE of eccc-rdrs
kasra-keshavarz Mar 4, 2024
59bc434
Tracking eccc-rdrs script
kasra-keshavarz Mar 4, 2024
d7a9a78
Tracking GWF-NCAR CONUS-I script
kasra-keshavarz Mar 4, 2024
2e9815f
Documentation for NASA's NEX-GDDP-CMIP6 dataset
kasra-keshavarz Mar 5, 2024
79de983
Script for NASA's NEX-GDDP-CMIP6 dataset
kasra-keshavarz Mar 5, 2024
e983994
Adding Ouranos ESPO-G6-R2 Dataset Script
kasra-keshavarz Mar 5, 2024
fd6d96d
Documenting Ouranos ESPO-G6-R2 Dataset script
kasra-keshavarz Mar 5, 2024
a4c22fc
Updating changelog for v0.5.0
kasra-keshavarz Mar 5, 2024
0946306
Adding a section for WIP directories
kasra-keshavarz Mar 5, 2024
fadeae9
Restructuring script directory
kasra-keshavarz Mar 5, 2024
5ce933f
Updates to the documentations
kasra-keshavarz Mar 5, 2024
877b24a
Upgrading style of warning message
kasra-keshavarz Mar 5, 2024
69a80a3
Upgrading style of warning message
kasra-keshavarz Mar 5, 2024
b61717f
Updating link addresses for CONUS I & II
kasra-keshavarz Mar 5, 2024
618cc7d
Updating link address to ERA5 dataset
kasra-keshavarz Mar 5, 2024
0d254fc
Removing dead link for the Ouranos MRCC5 dataset for now
kasra-keshavarz Mar 5, 2024
1a9b535
Merge branch 'main' into iss25
kasra-keshavarz Mar 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,6 @@
.ipynb_checkpoints
.DS_Store
*.swp

# WIP folders
scripts/ouranos-crcm5-cmip6/
15 changes: 15 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
Changelog
=========
[v0.5.0] March 5th, 2024
# Improvements
* `DATASETS` file now describes all the datasets available in the script
* new parallelization schemes are introduced using models, scenarios,
and ensemble members
* the `assets` directory is now more organized separating common NCL
and bash scripts needed
* styles of the script is updated (not completely) to be more compatible
with Google's shell scripting style guidelines
* Documentations have been updated
# Datasets
* Ouranos ESPO-G6-R2 CMIP6 script added (~9TBs)
* NASA GDDP-NEX-CMIP CMIP6 script added (~37TBs)
* Alberta Governments CMIP6 script added (~0.1TBs)

[v0.4.1] - September 21st, 2023
# Fixed
* minor bug fixes
Expand Down
80 changes: 80 additions & 0 deletions DATASETS
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
|--------------------------------|-----------|---------------------------|
| DATASET NAME | keyword | DOI |
|--------------------------------|-----------|---------------------------|
|1. NCAR-GWF WRF CONUS I | conus_i | 10.1007/s00382-016-3327-9 |
|2. NCAR-GWF WRF CONUS II | conus_ii | 10.5065/49SN-8E08 |
|3. ECMWF ERA5 | era5 | 10.24381/cds.adbb2d47 |
|4. ECCC RDRSv2.1 | rdrs | 10.5194/hess-25-4917-2021 |
|5. CCRN CanRCM4-WFDEI-GEM-CaPA | canrcm4_g | 10.5194/essd-12-629-2020 |
|6. WFDEI-GEM-CaPA | wfdei_g | 10.20383/101.0111 |
|7. ORNL Daymet | daymet | 10.3334/ORNLDAAC/2129 |
|8. Alberta Government | ab-gov | 10.5194/hess-23-5151-201 |
| 8.1. BCC-CSM2-MR | | ditto |
| 8.2. CNRM-CM6-1 | | ditto |
| 8.3. EC-Earth3-Veg | | ditto |
| 8.4. GFDL-CM4 | | ditto |
| 8.5. GFDL-ESM4 | | ditto |
| 8.6. IPSL-CM6A-LR | | ditto |
| 8.7. MRI-ESM2-0 | | ditto |
| 8.8. Hybrid-observation | | ditto |
|9. Ouranos ESPO-G6-R2 |espo-r6-r2 |10.1038/s41597-023-02855-z |
| 9.1. AS-RCEC | | ditto |
| 9.2. BCC | | ditto |
| 9.3. CAS | | ditto |
| 9.4. CCCma | | ditto |
| 9.5. CMCC | | ditto |
| 9.6. CNRM-CERFACS | | ditto |
| 9.7. CSIRO | | ditto |
| 9.8. CSIRO-ARCCSS | | ditto |
| 9.9. DKRZ | | ditto |
| 9.10. EC-Earth-Con | | ditto |
| 9.11. INM | | ditto |
| 9.12. IPS | | ditto |
| 9.13. MIROC | | ditto |
| 9.14. MOHC | | ditto |
| 9.15. MPI-M | | ditto |
| 9.16. MRI | | ditto |
| 9.17. NCC | | ditto |
| 9.18. NIMS-KMA | | ditto |
| 9.19. NOAA-GFDL | | ditto |
| 9.20. NUIST | | ditto |
|10. Ouranos MRCC5-CMIP6 |crcm5-cmip6| TBD |
| 10.1. CanESM5 | | TBD |
| 10.2. MPI-ESM1-2-LR | | TBD |
|11. NASA GDDP-NEX-CMIP6 | gddp-nex |10.1038/s41597-022-01393-4 |
| 11.0. ACCESS-CM2 | | ditto |
| 11.1. ACCESS-ESM1-5 | | ditto |
| 11.2. BCC-CSM2-MR | | ditto |
| 11.3. CanESM5 | | ditto |
| 11.4. CESM2 | | ditto |
| 11.5. CESM2-WACCM | | ditto |
| 11.6. CMCC-CM2-SR5 | | ditto |
| 11.7. CMCC-ESM2 | | ditto |
| 11.8. CNRM-CM6-1 | | ditto |
| 11.9. CNRM-ESM2-1 | | ditto |
| 11.10. EC-Earth3 | | ditto |
| 11.11. EC-Earth3-Veg-LR | | ditto |
| 11.12. FGOALS-g3 | | ditto |
| 11.13. GFDL-CM4 | | ditto |
| 11.14. GFDL-CM4_gr2 | | ditto |
| 11.15. GFDL-ESM4 | | ditto |
| 11.16. GISS-E2-1-G | | ditto |
| 11.17. HadGEM3-GC31-LL | | ditto |
| 11.18. HadGEM3-GC31-MM | | ditto |
| 11.19. IITM-ESM | | ditto |
| 11.20. INM-CM4-8 | | ditto |
| 11.21. INM-CM5-0 | | ditto |
| 11.22. IPSL-CM6A-LR | | ditto |
| 11.23. KACE-1-0-G | | ditto |
| 11.24. KIOST-ESM | | ditto |
| 11.25. MIROC6 | | ditto |
| 11.26. MIROC-ES2L | | ditto |
| 11.27. MPI-ESM1-2-HR | | ditto |
| 11.28. MPI-ESM1-2-LR | | ditto |
| 11.29. MRI-ESM2-0 | | ditto |
| 11.30. NESM3 | | ditto |
| 11.31. NorESM2-LM | | ditto |
| 11.32. NorESM2-MM | | ditto |
| 11.33. TaiESM1 | | ditto |
| 11.34. UKESM1-0-LL | | ditto |
|--------------------------------|-----------|---------------------------|
118 changes: 65 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,73 +4,79 @@ This repository contains scripts to process meteorological datasets in NetCDF fi
```console
Usage:
extract-dataset [options...]

Script options:
-d, --dataset Meteorological forcing dataset of interest
-i, --dataset-dir=DIR The source path of the dataset file(s)
-v, --variable=var1[,var2[...]] Variables to process
-o, --output-dir=DIR Writes processed files to DIR
-s, --start-date=DATE The start date of the data
-e, --end-date=DATE The end date of the data
-l, --lat-lims=REAL,REAL Latitude's upper and lower bounds
-n, --lon-lims=REAL,REAL Longitude's upper and lower bounds
-a, --shape-file=PATH Path to the ESRI shapefile; optional
-m, --ensemble=ens1,[ens2[...]] Ensemble members to process; optional
Leave empty to extract all ensemble members
-j, --submit-job Submit the data extraction process as a job
on the SLURM system; optional
-k, --no-chunk No parallelization, recommended for small domains
-p, --prefix=STR Prefix prepended to the output files
-b, --parsable Parsable SLURM message mainly used
for chained job submissions
-c, --cache=DIR Path of the cache directory; optional
-E, [email protected] E-mail user when job starts, ends, or
fails; optional
-u, --account Digital Research Alliance of Canada's sponsor's
account name; optional, defaults to 'rpp-kshook'
-V, --version Show version
-h, --help Show this screen and exit
-d, --dataset Meteorological forcing dataset of interest
-i, --dataset-dir=DIR The source path of the dataset file(s)
-v, --variable=var1[,var2[...]] Variables to process
-o, --output-dir=DIR Writes processed files to DIR
-s, --start-date=DATE The start date of the data
-e, --end-date=DATE The end date of the data
-l, --lat-lims=REAL,REAL Latitude's upper and lower bounds;
optional; within the [-90, +90] limits
-n, --lon-lims=REAL,REAL Longitude's upper and lower bounds;
optional; within the [-180, +180] limits
-a, --shape-file=PATH Path to the ESRI shapefile; optional
-m, --ensemble=ens1,[ens2,[...]] Ensemble members to process; optional
Leave empty to extract all ensemble members
-M, --model=model1,[model2,[...]] Models that are part of a dataset,
only applicable to climate datasets, optional
-S, --scenario=scn1,[scn2,[...]] Climate scenarios to process, only applicable
to climate datasets, optional
-j, --submit-job Submit the data extraction process as a job
on the SLURM system; optional
-k, --no-chunk No parallelization, recommended for small domains
-p, --prefix=STR Prefix prepended to the output files
-b, --parsable Parsable SLURM message mainly used
for chained job submissions
-c, --cache=DIR Path of the cache directory; optional
-E, [email protected] E-mail user when job starts, ends, or
fails; optional
-u, --account Digital Research Alliance of Canada's sponsor's
account name; optional, defaults to 'rpp-kshook'
-L, --list-datasets List all the available datasets and the
corresponding keywords for '--dataset' option
-V, --version Show version
-h, --help Show this screen and exit

```
# Available Datasets
|# |Dataset |Time Scale |DOI |Description |
|--|--------------------------|--------------------------------|-------------------------|--------------------------------------|
|1 |WRF-CONUS I (control) |Hourly (Oct 2000 - Dec 2013) |10.1007/s00382-016-3327-9|[link](./scripts/conus_i) |
|2 |WRF-CONUS II (control)[^1]|Hourly (Jan 1995 - Dec 2015) |10.5065/49SN-8E08 |[link](./scripts/conus_ii) |
|3 |ERA5[^2] |Hourly (Jan 1950 - Dec 2020) |10.24381/cds.adbb2d47 and [link](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview)|[link](./scripts/era5)|
|4 |RDRS v2.1 |Hourly (Jan 1980 - Dec 2018) |10.5194/hess-25-4917-2021|[link](./scripts/rdrs) |
|5 |CanRCM4-WFDEI-GEM-CaPA |3-Hourly (Jan 1951 - Dec 2100) |10.5194/essd-12-629-2020 |[link](./scripts/canrcm4_wfdei_gem_capa)|
|6 |WFDEI-GEM-CaPA |3-Hoursly (Jan 1979 - Dec 2016) |10.20383/101.0111 |[link](./scripts/wfdei_gem_capa) |
|7 |Daymet |Daily (Jan 1980 - Dec 2022)[^3] |10.3334/ORNLDAAC/2129 |[link](./scripts/daymet) |
|8 |BCC-CSM2-MR |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/bcc-csm2-mr) |
|9 |CNRM-CM6-1 |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/cnrm-cm6-1) |
|10|EC-Earth3-Veg |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/ec-earth3-veg) |
|11|GDFL-CM4 |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/gdfl-cm4) |
|12|GDFL-ESM4 |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/gdfl-esm4) |
|13|IPSL-CM6A-LR |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/ipsl-cm6a-lr) |
|14|MRI-ESM2-0 |Daily (Jan 1950 - Dec 2100)[^4] |*TBD* |[link](./scripts/mri-esm2-0) |
|15|Hybrid Observation(AB Gov)|Daily (Jan 1950 - Dec 2019)[^4] |10.5194/hess-23-5151-2019|[link](./scripts/hybrid_obs) |
|# |Dataset |Time Period |DOI |Description |
|--|---------------------------|--------------------------------|--------------------------|-------------------------------------|
|1 |GWF-NCAR WRF-CONUS I |Hourly (Oct 2000 - Dec 2013) |10.1007/s00382-016-3327-9 |[link](./scripts/gwf-ncar-conus_i) |
|2 |GWF-NCAR WRF-CONUS II[^1] |Hourly (Jan 1995 - Dec 2015) |10.5065/49SN-8E08 |[link](./scripts/gwf-ncar-conus_ii) |
|3 |ECMWF ERA5[^2] |Hourly (Jan 1950 - Dec 2020) |10.24381/cds.adbb2d47 and [link](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview)|[link](./scripts/ecmwf-era5)|
|4 |ECCC RDRSv2.1 |Hourly (Jan 1980 - Dec 2018) |10.5194/hess-25-4917-2021 |[link](./scripts/eccc-rdrs) |
|5 |CCRN CanRCM4-WFDEI-GEM-CaPA|3-Hourly (Jan 1951 - Dec 2100) |10.5194/essd-12-629-2020 |[link](./scripts/ccrn-canrcm4_wfdei_gem_capa)|
|6 |CCRN WFDEI-GEM-CaPA |3-Hourly (Jan 1979 - Dec 2016) |10.20383/101.0111 |[link](./scripts/ccrn-wfdei_gem_capa)|
|7 |ORNL Daymet |Daily (Jan 1980 - Dec 2022)[^3] |10.3334/ORNLDAAC/2129 |[link](./scripts/ornl-daymet) |
|8 |Alberta Gov Climate Dataset|Daily (Jan 1950 - Dec 2100) |10.5194/hess-23-5151-201 |[link](./scripts/ab-gov) |
|9 |Ouranos ESPO-G6-R2 |Daily (Jan 1950 - Dec 2100) |10.1038/s41597-023-02855-z|[link](./scripts/ouranos-espo-g6-r2) |
|10|Ouranos MRCC5-CMIP6 |hourly (Jan 1950 - Dec 2100) |TBD |link |
|11|NASA NEX-GDDP-CMIP6 |Daily (Jan 1950 - Dec 2100) |10.1038/s41597-022-01393-4|[link](./scripts/nasa-nex-gddp-cmip6)|

[^1]: For access to the files on Graham cluster, please contact [Stephen O'Hearn](mailto:[email protected]).
[^2]: ERA5 data from 1950-1979 are based on [ERA5 preliminary extenion](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview) and 1979 onwards are based on [ERA5 1979-present](https://doi.org/10.24381/cds.adbb2d47).
[^3]: For the Peurto Rico domain of the dataset, data are available from January 1950 until December 2022.
[^4]: Data is not publicly available yet. DOI is to be determined once the relevant paper is published.

# General Example
As an example, follow the code block below. Please remember that you MUST have access to Graham cluster with Digital Research Alliance of Canada (DRA) and have access to `CONUS I` model outputs. Also, remember to generate a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) with GitHub in advance. Enter the following codes in your Graham shell as a test case:
As an example, follow the code block below. Please remember that you MUST have access to Digital Research Alliance of Canada (DRA) clusters (specifically `Graham`) and have access to `RDRSv2.1` model outputs. Also, remember to generate a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) with GitHub in advance. Enter the following codes in your Graham shell as a test case:

```console
foo@bar:~$ git clone https://github.com/kasra-keshavarz/datatool # clone the repository
foo@bar:~$ cd ./datatool/ # move to the repository's directory
foo@bar:~$ ./extract-dataset.sh -h # view the usage message
foo@bar:~$ ./extract-dataset.sh --dataset=CONUS1 \
--dataset-dir="/project/rpp-kshook/Model_Output/WRF/CONUS/CTRL" \
--output-dir="$HOME/scratch/conus_i_output/" \
--start-date="2001-01-01 00:00:00" \
--end-date="2001-12-31 23:00:00" \
--lat-lims=49,51 \
--lon-lims=-117,-115 \
--variable=T2,PREC_ACC_NC,Q2,ACSWDNB,ACLWDNB,U10,V10,PSFC \
--prefix="conus_i";
foo@bar:~$ ./extract-dataset.sh \
--dataset="rdrs" \
--dataset-dir="/project/rpp-kshook/Climate_Forcing_Data/meteorological-data/rdrsv2.1" \
--output-dir="$HOME/scratch/rdrs_outputs/" \
--start-date="2001-01-01 00:00:00" \
--end-date="2001-12-31 23:00:00" \
--lat-lims=49,51 \
--lon-lims=-117,-115 \
--variable="RDRS_v2.1_A_PR0_SFC,RDRS_v2.1_P_HU_09944" \
--prefix="testing_";
```
See the [examples](./examples) directory for real-world scripts for each meteorological dataset included in this repository.

Expand All @@ -80,10 +86,16 @@ only in cases where jobs are submitted to clusters' schedulers. If
processing is not submitted as a job, then the logs are printed on screen.

# New Datasets
If you are considering any new dataset to be added to the data repository, and subsequently the associated scripts added here, you can open a new ticket on the **Issues** tab of the current repository. Or, you can make a [Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) on this repository with your own script.
If you are considering any new dataset to be added to the data
repository, and subsequently the associated scripts added here,
you can open a new ticket on the **Issues** tab of the current
repository. Or, you can make a
[Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request)
on this repository with your own script.

# Support
Please open a new ticket on the **Issues** tab of the current repository in case of any issues.
Please open a new ticket on the **Issues** tab of this repository for
support.

# License
Meteorological Data Processing Workflow - datatool <br>
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.4.2-dev
0.5.0
28 changes: 28 additions & 0 deletions assets/bash_scripts/extract_subdir_level.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/bash

# Input comma-separated string
root_path="$1"
input_string="$2"

# Split the input string by comma
IFS=',' read -ra directories <<< "$input_string"

# Initialize an empty string to store results
result_string=""

# Iterate over each directory
for dir in "${directories[@]}"; do
# Find subdirectories
IFS=' ' read -ra subdirs <<< $(find "$root_path/$dir" -mindepth 1 -maxdepth 1 -type d -printf "%f ")

# Prepend each subdirectory with its original value from input_string
for subdir in ${subdirs[@]}; do
result_string+="$dir/${subdir##*/},"
done
done

# Remove the trailing comma, if any
result_string=${result_string%,}

echo "$result_string"

File renamed without changes.
File renamed without changes.
Loading