Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets with lat/lon and latitude/longitude raise error with get_coord_type #61

Open
ellesmith88 opened this issue Mar 1, 2021 · 10 comments

Comments

@ellesmith88
Copy link
Contributor

ellesmith88 commented Mar 1, 2021

  • roocs-utils version: v0.2.1

Description

When scanning for the inventory a few datasets have raised an error when we attempt to identify their coordinate types:

def get_coord_info(fpaths):
ds = open_xr_dataset(fpaths)
d = OrderedDict()
for coord_id in sorted(ds.coords):
coord = ds.coords[coord_id]
type = get_coord_type(coord)
if type == "time" or type is None:
continue
data = coord.values
mn, mx = data.min(), data.max()
d[f"{type}"] = f"{mn:.2f} {mx:.2f}"
return d

On closer inspection these datasets have coordinates like the below (c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp126.r1i1p1f1.SImon.siconc.gn.v20200219)

Coordinates:
  * lat        (lat) float64 -81.5 -80.5 -79.5 -78.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
    latitude   (lat, lon) float32 dask.array<chunksize=(232, 360), meta=np.ndarray>
    longitude  (lat, lon) float32 dask.array<chunksize=(232, 360), meta=np.ndarray>
    type       |S7 ...

and so when cf-xarray attempts to identify latitude/longitude it finds two coordinates for each and raises the error ValueError: Receive multiple variables for key 'longitude': ['longitude', 'lon']. Expected only one. Please pass a list ['longitude'] instead to get all variables matching 'longitude'.

Both latitudes have the same values but lon is from -280 to 80 and longitude from 0 to 360

So it is the lines

if "latitude" in coord.cf and coord.cf["latitude"].name == coord.name:
return True

and

if "longitude" in coord.cf and coord.cf["longitude"].name == coord.name:
return True

where the issue arises.

It also raises the question of which values we would want to be represented in the inventory as longitude and latitude min/max values. I expect it would be longitude from 0 to 360.

For now I could try commenting those lines out to scan them - but the datasets would still raise issues elsewhere when being processed

@agstephens
Copy link
Contributor

@ellesmith88: this does look strange. I did an ncdump to check:

ncdump -v lon,longitude  /badc/cmip6/data/CMIP6/ScenarioMIP/BCC/BCC-CSM2-MR/ssp126/r1i1p1f1/SImon/siconc/gn/v20200219/siconc_SImon_BCC-CSM2-MR_ssp126_r1i1p1f1_gn_201501-210012.nc 

...
data:

 lon = 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5,
    13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5,
    25.5, 26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5,
    37.5, 38.5, 39.5, 40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5,
    49.5, 50.5, 51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5,
    61.5, 62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5,
    73.5, 74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5,
    85.5, 86.5, 87.5, 88.5, 89.5, 90.5, 91.5, 92.5, 93.5, 94.5, 95.5, 96.5,
    97.5, 98.5, 99.5, 100.5, 101.5, 102.5, 103.5, 104.5, 105.5, 106.5, 107.5,
    108.5, 109.5, 110.5, 111.5, 112.5, 113.5, 114.5, 115.5, 116.5, 117.5,
    118.5, 119.5, 120.5, 121.5, 122.5, 123.5, 124.5, 125.5, 126.5, 127.5,
    128.5, 129.5, 130.5, 131.5, 132.5, 133.5, 134.5, 135.5, 136.5, 137.5,
    138.5, 139.5, 140.5, 141.5, 142.5, 143.5, 144.5, 145.5, 146.5, 147.5,
    148.5, 149.5, 150.5, 151.5, 152.5, 153.5, 154.5, 155.5, 156.5, 157.5,
    158.5, 159.5, 160.5, 161.5, 162.5, 163.5, 164.5, 165.5, 166.5, 167.5,
    168.5, 169.5, 170.5, 171.5, 172.5, 173.5, 174.5, 175.5, 176.5, 177.5,
    178.5, 179.5, 180.5, 181.5, 182.5, 183.5, 184.5, 185.5, 186.5, 187.5,
    188.5, 189.5, 190.5, 191.5, 192.5, 193.5, 194.5, 195.5, 196.5, 197.5,
    198.5, 199.5, 200.5, 201.5, 202.5, 203.5, 204.5, 205.5, 206.5, 207.5,
    208.5, 209.5, 210.5, 211.5, 212.5, 213.5, 214.5, 215.5, 216.5, 217.5,
    218.5, 219.5, 220.5, 221.5, 222.5, 223.5, 224.5, 225.5, 226.5, 227.5,
    228.5, 229.5, 230.5, 231.5, 232.5, 233.5, 234.5, 235.5, 236.5, 237.5,
    238.5, 239.5, 240.5, 241.5, 242.5, 243.5, 244.5, 245.5, 246.5, 247.5,
    248.5, 249.5, 250.5, 251.5, 252.5, 253.5, 254.5, 255.5, 256.5, 257.5,
    258.5, 259.5, 260.5, 261.5, 262.5, 263.5, 264.5, 265.5, 266.5, 267.5,
    268.5, 269.5, 270.5, 271.5, 272.5, 273.5, 274.5, 275.5, 276.5, 277.5,
    278.5, 279.5, 280.5, 281.5, 282.5, 283.5, 284.5, 285.5, 286.5, 287.5,
    288.5, 289.5, 290.5, 291.5, 292.5, 293.5, 294.5, 295.5, 296.5, 297.5,
    298.5, 299.5, 300.5, 301.5, 302.5, 303.5, 304.5, 305.5, 306.5, 307.5,
    308.5, 309.5, 310.5, 311.5, 312.5, 313.5, 314.5, 315.5, 316.5, 317.5,
    318.5, 319.5, 320.5, 321.5, 322.5, 323.5, 324.5, 325.5, 326.5, 327.5,
    328.5, 329.5, 330.5, 331.5, 332.5, 333.5, 334.5, 335.5, 336.5, 337.5,
    338.5, 339.5, 340.5, 341.5, 342.5, 343.5, 344.5, 345.5, 346.5, 347.5,
    348.5, 349.5, 350.5, 351.5, 352.5, 353.5, 354.5, 355.5, 356.5, 357.5,
    358.5, 359.5 ;

 longitude =
  0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5,
    14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5,
    26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5,
    38.5, 39.5, 40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5,
    50.5, 51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5,
    62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5,
    74.5, 75.5, 76.5, 77.5, 78.5, 79.5, -279.5, -278.5, -277.5, -276.5,
    -275.5, -274.5, -273.5, -272.5, -271.5, -270.5, -269.5, -268.5, -267.5,
    -266.5, -265.5, -264.5, -263.5, -262.5, -261.5, -260.5, -259.5, -258.5,
    -257.5, -256.5, -255.5, -254.5, -253.5, -252.5, -251.5, -250.5, -249.5,
    -248.5, -247.5, -246.5, -245.5, -244.5, -243.5, -242.5, -241.5, -240.5,
    -239.5, -238.5, -237.5, -236.5, -235.5, -234.5, -233.5, -232.5, -231.5,
    -230.5, -229.5, -228.5, -227.5, -226.5, -225.5, -224.5, -223.5, -222.5,
...

It seems like the data is erroneous - I can't understand why the lon values would differ from the longitude values. I also think that this file shouldn't be using a 2D latitude/longitude description when they are actually regular latitude and longitude (i.e repeats of the same numbers).

Does this error appear in other data or only in this dataset? The short-term solution is to put this on an EXCLUDE_LIST of datasets that we are not processing right now.

We should discuss with Ruth and Martin.

@ellesmith88
Copy link
Contributor Author

At the moment it is looking to be around 25 datasets that are affected - I will update once I have confirmed this.

Steps to follow:

Step 1 is to proceed by excluding those datasets. We can then discuss them with Ruth and Martin to come up an appropriate solution (i.e. fix the data, change our code, or exclude the data).

@ellesmith88
Copy link
Contributor Author

There are 23 datasets that raise this error:

c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.r1i1p1f1.Ofx.deptho.gn.v20201021
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.r1i1p1f1.Omon.sos.gn.v20190319
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.r1i1p1f1.SImon.siconc.gn.v20200219
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.r1i1p1f1.Ofx.sftof.gn.v20201021
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp370.r1i1p1f1.Omon.zos.gn.v20190429
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp245.r1i1p1f1.Omon.sos.gn.v20190319
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp245.r1i1p1f1.Omon.zos.gn.v20190429
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp245.r1i1p1f1.SImon.siconc.gn.v20200219
c3s-cmip6.ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp370.r1i1p1f1.Omon.sos.gr.v20200130
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp585.r1i1p1f1.SImon.siconc.gn.v20200219
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp585.r1i1p1f1.Omon.sos.gn.v20190319
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp585.r1i1p1f1.Omon.zos.gn.v20190429
c3s-cmip6.CMIP.BCC.BCC-CSM2-MR.historical.r1i1p1f1.Omon.zos.gn.v20190429
c3s-cmip6.CMIP.BCC.BCC-CSM2-MR.historical.r1i1p1f1.Omon.tos.gn.v20181126
c3s-cmip6.CMIP.BCC.BCC-CSM2-MR.historical.r1i1p1f1.SImon.siconc.gn.v20200218
c3s-cmip6.ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp245.r1i1p1f1.Omon.sos.gr.v20200130
c3s-cmip6.ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp245.r1i1p1f1.Omon.tos.gr.v20200130
c3s-cmip6.CMIP.BCC.BCC-ESM1.historical.r1i1p1f1.Omon.tos.gn.v20181129
c3s-cmip6.CMIP.BCC.BCC-ESM1.historical.r1i1p1f1.SImon.siconc.gn.v20200218
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp126.r1i1p1f1.SImon.siconc.gn.v20200219
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp126.r1i1p1f1.Omon.sos.gn.v20190319
c3s-cmip6.ScenarioMIP.BCC.BCC-CSM2-MR.ssp126.r1i1p1f1.Omon.zos.gn.v20190429
c3s-cmip6.ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp126.r1i1p1f1.Omon.tos.gr.v20200130

@agstephens
Copy link
Contributor

agstephens commented Mar 1, 2021

Tagging @RuthPetrie and @martinjuckes

@ellesmith88: Myself and Ruth just found this:

$ ncdump -v lon,longitude  /badc/cmip6/data/CMIP6/ScenarioMIP/BCC/BCC-CSM2-MR/ssp126/r1i1p1f1/SImon/siconc/gn/v20200219/siconc_SImon_BCC-CSM2-MR_ssp126_r1i1p1f1_gn_201501-210012.nc | tail -200 | head -10
    -117.5732, -117.0002, -116.4461, -115.9096, -115.3898, -114.8857,
    -114.3963, -113.9208, -113.4584, -113.0083, -112.57, -112.1426,
    -111.7256, -111.3184, -110.9204, -110.5313, -110.1504, -109.7773,
    -109.4116, -109.0529, -108.7009, -108.355, -108.0151, -107.6807,
    -107.3516, -107.0274, -106.7079, -106.3928, -106.0819, -105.7748,
    -105.4714, -105.1715, -104.8747, -104.5809, -104.2899, -104.0015,
    -103.7155, -103.4317, -103.1499, -102.87, -102.5917, -102.3149,
    -102.0395, -101.7652, -101.492, -101.2195, -100.9478, -100.6766,
    -100.4058, -100.1352, -99.86475, -99.59418, -99.32337, -99.05219,
    -98.78046, -98.50805, -98.23478, -97.96051, -97.68507, -97.40829,

So, the longitude values are actually varying slowly over time.

@agstephens
Copy link
Contributor

So, the problem in the code here is probably that lon should not include the units=degrees_east, axis=X and standard_name=longitude.

        double lon(lon) ;
                lon:bounds = "lon_bnds" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
                lon:long_name = "Longitude" ;
                lon:standard_name = "longitude" ;

@martinjuckes
Copy link

Interesting catch. The /badc/cmip6/data/CMIP6/ScenarioMIP/BCC/BCC-CSM2-MR/ssp126/r1i1p1f1/SImon/siconc/gn/v20200219/siconc_SImon_BCC-CSM2-MR_ssp126_r1i1p1f1_gn_201501-210012.nc appears to have been written by CMOR, so it may be worth raising a CMOR issue, once we figure out the scope of the problem.

@agstephens : I agree with your interpretation, apart from the interpretation of axis: I think it is correct to have this on the dimension which behaves like and x-coordinate (e.g. as in the example 5.2 in Section 5.2 of the CF Convention).

@ellesmith88
Copy link
Contributor Author

The same thing exists in 81 out of 517 datasets of the BCC-CSM2-MR model in our archive. The datasets affected are listed in the attached file.
bcc_issue_exists.txt

@agstephens agstephens changed the title datasets with lat/lon and loatitude/longitude raise error with get_coord_type datasets with lat/lon and latitude/longitude raise error with get_coord_type Mar 4, 2021
@agstephens
Copy link
Contributor

ECMWF have agreed that we can omit these datasets from the first inventory/release.

@martinjuckes
Copy link

NB: this data is on the ocean grid. I think the lat and lon arrays should be grid_latitude and grid_longitude respectively.

It is interesting that this is not captured as a CF error .. it is probably not explicitly mentioned as an error because it is obviously wrong. It may be worth proposing an additional CF rule.

@agstephens
Copy link
Contributor

@agstephens @ellesmith88 : let's raise this as an issue with CF. It might be an error in the conventions or in the cf-checker.
Is it allowed in the convention? Does it get checked for in the cf-checker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants