catalog vocabulary slightly incompatible with example analysis script usage #120

ceblanton · 2024-05-02T19:25:21Z

FRE Canopy is generating catalogs using:

module load fre/canopy

fre catalog build --overwrite -i $ppdir -o $ppdir/catalog

sed -i.bak -e 's/,P1M,/,monthly,/' $ppdir/catalog.csv

An example pp directory and catalog file are here:

/archive/Chris.Blanton/am5/am5f7b11r0/c96L65_am5f7b11r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp
/archive/Chris.Blanton/am5/am5f7b11r0/c96L65_am5f7b11r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp/catalog.json

The example analysis script usage (the Ray example) is:

module load python/3.9

source /net2/rlm/analysis-scripts/example/env/bin/activate

python3 -c "from freanalysis_clouds import CloudAnalysisScript; CloudAnalysisScript().run_analysis('/archive/Chris.Blanton/am5/am5f7b11r0/c96L65_am5f7b11r0_amip/gfdl.ncrc5-deploy-prod-openmp/pp/catalog.json', '/nbhome/$USER/sample-output')"

That fails with this message

/net2/rlm/analysis-scripts/example/env/lib/python3.9/site-packages/pydantic/deprecated/decorator.py:222: UserWarning: There are no datasets to load! Returning an empty dictionary.

  return self.raw_function(**d, **var_kwargs)

Traceback (most recent call last):

  File "<string>", line 1, in <module>

  File "/net2/rlm/analysis-scripts/example/env/lib/python3.9/site-packages/freanalysis_clouds/__init__.py", line 125, in run_analysis

    datasets[self.metadata.catalog_key(variable)],

KeyError: 'c96L65_am5f4b4r1-newrad_amip.monthly.na.atmos.high_cld_amt'

The mystery is that this very-similar catalog works:

/net2/rlm/analysis-scripts/example/catalog.json

The difference we think is "n/a" versus missing for the ensemble vocabulary.

Hopefully, the "fre catalog validate /path/to/schema.json /path/to/catalog-to-test.json" usage can detect this mismatch or inconsistency before we try to launch the script.

The text was updated successfully, but these errors were encountered:

aradhakrishnanGFDL · 2024-05-07T18:40:06Z

cat = cat.search(variable_id="high_cld_amt")
dset_dict = cat.to_dataset_dict(cdf_kwargs={'chunks': {'time':5}, 'decode_times': False})

--> The keys in the returned dictionary of datasets are constructed as follows:
'source_id.experiment_id.frequency.modeling_realm.variable_id.chunk_freq'

████████████████████████████████████████████████████████████████████████████████████████| 100.00% [2/2 00:04<00:00]
dset_dict.keys()
dict_keys(['am5.c96L65_am5f7b11r0_amip.P1M.atmos_level.high_cld_amt.P1Y', 'am5.c96L65_am5f7b11r0_amip.P1M.atmos.high_cld_amt.P1Y'])

aradhakrishnanGFDL · 2024-05-07T18:41:54Z

@ceblanton member_id is empty "" , when it's empty the logic in Ray's script perhaps should be to remove it in key name?

aradhakrishnanGFDL · 2024-05-07T18:50:44Z

or we enforce no null which may be something we discussed before.

aradhakrishnanGFDL · 2024-05-10T15:36:52Z

on May 9th, it was decided to use "na" as the default value for the aggregate columns rather than the empty values, to help maintain a "key pattern" at the early stage of adopting this. Down the line, we will provide examples to dynamically query for the dataset/key names.

aradhakrishnanGFDL · 2024-05-13T19:56:47Z

@ceblanton

PR is ready for member_id to be "na" by default. But, I realize Ray's key still is missing the chunk frequency which is an aggregate column. I am not sure if leaving it in the key or using a default for chunk_freq is a good idea. We can't possibly find unique datasets without that. But this also circles back to not having to hard-code these key names.

this now works:

am5.c96L65_am5f7b11r0_amip.P1M.na.atmos_level.high_cld_amt.P1Y

You can test:


import intake, intake_esm
cat = /home/a1r/cat/canopy/am5f7b11r0/c96L65_am5f7b11r0_amipn0513.json

import intake,intake_esm

cat = intake.open_esm_datastore(col)
cat_store = intake.open_esm_datastore(cat)

cat_subset = cat_store.search(variable_id="high_cld_amt")

dset_dict = cat_subset.to_dataset_dict(cdf_kwargs={'chunks': {'time':5}, 'decode_times': False})

#this gives the dataset names dynamically based on the search and existing catalog+spec. 

for k in dset_dict.keys(): 
    print(k)

#test for the new key that is expected to work

dset_dict['am5.c96L65_am5f7b11r0_amip.P1M.na.atmos_level.high_cld_amt.P1Y']

aradhakrishnanGFDL · 2024-05-21T21:07:32Z

figure generated : /nbhome/a1r/analysis-scripts/pngs/cloud-fraction.png

script used: https://github.com/aradhakrishnanGFDL/analysis-scripts/blob/prototype1-a1r/raytest.py

changes made are in my fork
and its only for one suite

https://github.com/aradhakrishnanGFDL/analysis-scripts/tree/prototype1-a1r/freanalysis_clouds

aradhakrishnanGFDL · 2024-05-21T21:10:14Z

to support this, we need to remove source_id from the aggregation columns. MDTF uses it though. so let's discuss.. @ceblanton

aradhakrishnanGFDL mentioned this issue May 13, 2024

Addresses https://github.com/aradhakrishnanGFDL/CatalogBuilder/issues/120 #126

Closed

aradhakrishnanGFDL linked a pull request May 13, 2024 that will close this issue

Addresses https://github.com/aradhakrishnanGFDL/CatalogBuilder/issues/120 #126

Closed

aradhakrishnanGFDL changed the title ~~catalog vocabulary slightly incompatible with example analysis script uage~~ catalog vocabulary slightly incompatible with example analysis script usage May 14, 2024

aradhakrishnanGFDL mentioned this issue May 23, 2024

Light touches for catalog vocabulary menzel-gfdl/analysis-scripts#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catalog vocabulary slightly incompatible with example analysis script usage #120

catalog vocabulary slightly incompatible with example analysis script usage #120

ceblanton commented May 2, 2024 •

edited

Loading

aradhakrishnanGFDL commented May 7, 2024

aradhakrishnanGFDL commented May 7, 2024

aradhakrishnanGFDL commented May 7, 2024

aradhakrishnanGFDL commented May 10, 2024

aradhakrishnanGFDL commented May 13, 2024

aradhakrishnanGFDL commented May 21, 2024

aradhakrishnanGFDL commented May 21, 2024

catalog vocabulary slightly incompatible with example analysis script usage #120

catalog vocabulary slightly incompatible with example analysis script usage #120

Comments

ceblanton commented May 2, 2024 • edited Loading

aradhakrishnanGFDL commented May 7, 2024

aradhakrishnanGFDL commented May 7, 2024

aradhakrishnanGFDL commented May 7, 2024

aradhakrishnanGFDL commented May 10, 2024

aradhakrishnanGFDL commented May 13, 2024

aradhakrishnanGFDL commented May 21, 2024

aradhakrishnanGFDL commented May 21, 2024

ceblanton commented May 2, 2024 •

edited

Loading