Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to README and other small changes #15

Merged
merged 9 commits into from
Jun 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 81 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,65 +11,114 @@ pip install -r requirements.txt
pip install -e .
```


## Usage

checksit is comprised of four key components - [check](#checksit-check), [describe](#checksit-describe), [show-specs](#checksit-show-specs), and [summary](#checksit-summary)


## checksit check

Check file against a template.

### Basic Usage

```
checksit check /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Checks format of file.
* checksit searches its template cache for a similar file to compare against

checksit check /group_workspaces/jasmin2/ukcp18/UKcordex/UKCP18_UI_test/tas_rcp85_land-rcm_uk_12km_AD_mon_198012-208011.nc

# Verbose
checksit check --verbose /group_workspaces/jasmin2/ukcp18/UKcordex/UKCP18_UI_test/tas_rcp85_land-rcm_uk_12km_AD_mon_198012-208011.nc
### Main Features

checksit check --verbose /group_workspaces/jasmin2/ukcp18/UKcordex-laura/tasmax_rcp85_land-rcm_uk_12km_EC-EARTH_r12i1p1_HIRHAM5_day_19801201-19901130.nc
#### Define template
```
checksit check --template=template-cache/rls_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.cdl /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Use `--template` flag to define a template to use
* Template can be in template-cache or any file user has access to
* Note: cdl files are a representation of a netCDF file, being the output from `ncdump -h` on the netCDF file

## Features

### Mapping the names of attributes or sub-dictionaries (such as variables)
#### Map variable names
```
checksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc
```
* Allows mapping of variable name, for the case that the name of a variable is different between the file to be checked and the template
* Format - `-m <template variable name>=<file variable name>`
* Multiple mappings should be comma separated

Say you want to compare the contents of a data file with a template file, but you know that
a variable has been given a different NetCDF variable ID in the data file, you can tell the
checker to map the name as follows:

#### Ignore attributes
```
$ checksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc
checksit check --ignore-attrs=global_attributes:time_coverage_start,global_attributes:time_coverage_end,global_attributes:tracking_id /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc
```
* Define attributes to ignore in checking

This will find the "cloud_area_fraction" variable ID in the data file and match it to the
"cltAnom" variable dictionary in the template.

### Connecting to controlled vocabularies
#### Define additional rules for checking
```
checksit check --rules=global_attributes:id=rule-func:match-file-name:lowercase:no-extension /neodc/esacci/sea_ice/data/sea_ice_thickness/L3C/envisat/v2.0/SH/2012/ESACCI-SEAICE-L3C-SITHICK-RA2_ENVISAT-SH50KMEASE2-201202-fv2.0.nc
```
* Check items against defined rules
* Format - `<what to check>=<rule type>:<function/check>[:<extras>[:<extras>...]]`
* Four options for `<rule type>`:
* `rule-func` - check item against a defined function, 4 options:
* `match-file-name` - item must be the same as the file name, allowing for formatting through `<extras>` - `lowercase`, `uppercase`, `no_extension` - example: `global_attributes:id=rule-func:match-file-name:lowercase:no-extension`
* `match-one-of` - item must be the same as one of the `<extras>` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:project=rule-func:match-one-of:"ukcp18|ukcp09"`
* `match-one-or-more-of` - item must be the same as one or more of the `<extras>` given. Multiple options should be separated by a `|` and surrounded by double quotation marks - example: `global_attributes:contact=rule-func:match-one-or-more-of:"[email protected]|UKCP Team|MOHC"`
* `string-of-length` - item must be the same length as given `<extra>` or greater if `+` is given at end of `<extra>` - example: `global_attributes:project=rule-func:string-of-length:10,global_attributes:contact=rule-func:string-of-length:100+`
* `type-rule` - check item is of type as defined in `<extra>` - example: `transverse_mercator:false_northing=type-rule:integer`
* `regex` - check item for regular expression match - example: `global_attributes:project=regex:ukcp18`
* `regex-rule` - check item matches pre-defined regex rule, name of which is given in `<extra>`
* current options are `integer`,`valid-email`,`valid-url`,`valid-url-or-na`,`match:vN.M`,`datetime`,`datetime-or-na`,`number`


### Additional Options

#### specs
```
checksit check --specs=ceda-base /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
* Checks file against a given specification. For more info, see [checksit show-specs](#checksit-show-specs)


#### auto-cache
```
checksit check --auto-cache --template=/badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/08/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_08_day_20671201-20681130.nc /badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day/latest/rss_rcp85_land-cpm_uk_2.2km_01_day_20671201-20681130.nc
```
$ checksit check -m cltAnom=cloud_area_fraction /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc
* Create a cache of the given template to add to add to checksit's template_cache

Running with:
Template: template-cache/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.cdl
Datafile: /gws/nopw/j04/cmip6_prep_vol1/ukcp18/data/land-prob/v20211110/uk/25km/rcp85/sample/b8110/30y/cltAnom/mon/v20211110/cltAnom_rcp85_land-prob_uk_25km_sample_b8110_30y_mon_20091201-20991130.nc

#### verbose
```
checksit check --verbose /group_workspaces/jasmin2/ukcp18/incoming-astephen/ukcordex-example/tasmax_rcp85_land-rcm_uk_12km_EC-EARTH_r12i1p1_HIRHAM5_day_19801201-19901130.nc
```
* Print additional information


---------------- Running checks ------------------

[FAILED] with 2 errors:
## checksit describe

01. [variables] sample:units: 'UNDEFINED' does not match expected: '1'
02. [global_attributes] 'CF-1.7' not in vocab options: ['CF-1.5', 'CF-1.6']
```
checksit describe
```
* Prints docstring of rules that can be used in `checksit check --rules`
* Individual rules can be printed out, e.g. `checksit describe match-one-of`



## Ideas
## checksit show-specs

```
template_cache=my-template-cache
basedir=/badc/ukcp18/data/land-cpm/uk/2.2km/rcp85/01/rss/day
checksit show-specs <spec-id>
```
* Prints out specs for a given spec-id, e.g. `ceda-base`
* sped-ids are saved in checksit/specs/groups

for latest_dir in $(find -L $basedir -type d -name latest); do
first_file=$(ls $latest_dir/*.nc | head -1)
facets=$(basename $first_file | cut -d_ -f1-7 | sed 's/_/ /g')
#rss_rcp85_land-cpm_uk_2.2km_01_day_19801201-19811130.nc
echo $facets
ncdump -h $first_file
done

```

## checksit summary

* Summarises output from a number of log files created through `checksit check`
4 changes: 2 additions & 2 deletions checksit/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ def check_global_attrs(dct, defined_attrs=None, vocab_attrs=None):
errors = []

for attr in defined_attrs:
if is_undefined(dct.get(attr)):
if is_undefined(dct['global_attributes'].get(attr)):
errors.append(f"[global-attributes:**************:{attr}]: Attribute '{attr}' must have a valid definition.")

for attr in vocab_attrs:
errors.extend(vocabs.check(vocab_attrs[attr], dct.get(attr, UNDEFINED), label=f"[global-attributes:******:{attr}]***"))
errors.extend(vocabs.check(vocab_attrs[attr], dct['global_attributes'].get(attr, UNDEFINED), label=f"[global-attributes:******:{attr}]***"))


return errors
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ click
pyyaml
cf-python
netcdf4

pandas