Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change units caravan forcing, added support for data_sources, changed version #433

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

RolfHut
Copy link
Contributor

@RolfHut RolfHut commented Jun 18, 2024

  • changed the units to be consistent with lumpedmakking
  • added support for the new 'data_source' option. Reverts to ERA5_land (Standard for caravan) if not specified.
  • changed hardcoded version in URL to global and bumped to v2

@RolfHut RolfHut requested a review from BSchilperoort June 18, 2024 17:53
@RolfHut
Copy link
Contributor Author

RolfHut commented Jun 20, 2024

I'm running into some naming problems: ERA5 variables are re-named to standards, but those variables already exist in the other datasets.

@BSchilperoort
Copy link
Member

I'm running into some naming problems: ERA5 variables are re-named to standards, but those variables already exist in the other datasets.

Would you not first apply the selection of which dataset to use, therefore avoid any variable duplicates?

@RolfHut
Copy link
Contributor Author

RolfHut commented Jun 20, 2024 via email

@Daafip
Copy link
Collaborator

Daafip commented Jul 2, 2024

Not really related to units, but would be nice to squeeze in here.
If you pass a 'bad' basin_id, e.g. '01439500' instead of 'camels_01439500' you get a long error, unrelated to your issue:

OSError: [Errno -90] NetCDF: file not found: 'https://opendap.4tu.nl/thredds/dodsC/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/1/01439500.nc' 

maybe catch this first, then actually try retrieve the dataset

@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from VascoSch92 Jul 8, 2024
@eWaterCycle eWaterCycle deleted a comment from RolfHut Jul 8, 2024
@BSchilperoort
Copy link
Member

I fixed some of the issues you might have been encountering. Seemed to work for era5_land now, not yet daymet.

I did encounter a problem with the dataset:
image

There seem to be a lot of duplicate entries for basins. The dimension size for Camels USA's basin_id is 1153, but it only has 671 unique values.

@Daafip
Copy link
Collaborator

Daafip commented Jul 9, 2024

I thought I checked that, but in that case I made some mistakes with merging. Looks like it was appended rather than merged...

The code I used is here, could you take a look if you spot an error Bart? I still have the (unmerged) files locally so if it's a quick fix I can run it again.

@Daafip
Copy link
Collaborator

Daafip commented Jul 10, 2024

Oh, I think I see the issue.
When combining the three data sources I used ds = xr.concat(file_lst,dim=('data_source')),which did work:

Dimensions:               (basin_id: 671, time: 12784, data_source: 3)
Coordinates:
  * basin_id              (basin_id) |S15 b'camels_01013500' ... b'camels_144...
  * time                  (time) datetime64[ns] 1980-01-01 ... 2014-12-31
  * data_source           (data_source) |S64 b'nldas' b'maurer' b'daymet'

When using that to combine this with the exisiting caravan dataset, it doesn't merge on the other dimensions (as they are dissimilar?). I should've checked this before uploading, sorry.

Dimensions:                             (time: 14975, basin_id: 1153,
                                         data_source: 4)

I probably need to use merge in this case?

Can give that a try sometime next week at the earliest.

@BSchilperoort
Copy link
Member

When using that to combine this with the exisiting caravan dataset, it doesn't merge on the other dimensions (as they are dissimilar?).

Yeah if the dimension size is not the same it's more challenging to merge the data. Perhaps the first step would be to make the basin count + labels the same for the new data, before merging the new variables into the original data.

@RolfHut
Copy link
Contributor Author

RolfHut commented Jul 12, 2024

When using that to combine this with the exisiting caravan dataset, it doesn't merge on the other dimensions (as they are dissimilar?).

Yeah if the dimension size is not the same it's more challenging to merge the data. Perhaps the first step would be to make the basin count + labels the same for the new data, before merging the new variables into the original data.

Hmm, I'm hesitant to change the original data: we are trying to give access to the original datasets. (not helping, just giving the broader context)

@Daafip
Copy link
Collaborator

Daafip commented Jul 15, 2024

Yeah if the dimension size is not the same it's more challenging to merge the data.

I found that the first entry gives era5 and the last the other three (as it appended). We can use it this to ductape it together (option 1).

Neater would be to update the dataset.
We can then have two files: camels_caravan.nc and camels_original.nc.
If you specify nothing or era5 you just use the caravan dataset as is (camels_caravan.nc). If you want daymet, Maurer or NLDAS, then you access the camels_original.nc file which I created from camels txt data.

@RolfHut @BSchilperoort what are your views on this? Im happy to do either whilst I still have a TUD account to update the dataset easily (not sure what happens after).

@RolfHut
Copy link
Contributor Author

RolfHut commented Aug 2, 2024

I prefer the option with the different files based on which dataset you request. This does mean a new upload indeed.

I have just fixed the naming issue: generating now works, but you do still have the problem with the double datasets. This also makes 'to_xarray()' fail. If you can give a headsup when the source data is fixed, I can see if 'to_xarray' works again and move on with the climate change analyses.

@BSchilperoort
Copy link
Member

@RolfHut I was looking into this again. Modifying the original datasets was too much work/data for me at the moment (encountered some issues with Caravan too). Writing code to work around it is a lot easier now.

482/1153 camels (USA) basins are missing the daymet data (not sure why: it's just missing 🤷‍♂️ ). Otherwise massaging the data into a useful format goes OK now, opening a PR soon.

@Daafip
Copy link
Collaborator

Daafip commented Nov 14, 2024

482/1153 camels (USA) basins are missing the daymet data

So ive managed to find some time to dig into the issue. There is 671 in the original camels USA dataset. This is used for my thesis work.

In version 1 of the opendap, I just used the camels as present in the caravan datset, which is a subset of the orignal data set at 482 basins. This difference was unkown to me at the time, the forcing of this is all era5.

I then attempted to add the original camels dataset which has 671 basin and 3 forms of forcing. 671 + 482 = 1153: I appended rather than merged the datasets ... I'll look into fixing this.

As a continuation of my thesis I aim to publish a paper on it, with some time provided by HKV and some in my own time (considering I still have a bit to do).

@RolfHut
Copy link
Contributor Author

RolfHut commented Nov 18, 2024

nice to hear you are working on it @Daafip . Please be in close communication with @BSchilperoort given the fixes he build recently to make caravan data available in eWaterCycle. We don't want you to do double work, nor to build on assumptions that are no longer valid. I think the main point changed is where the unit conversions happen.

@BSchilperoort
Copy link
Member

I think the main point changed is where the unit conversions happen.

changes I made in HBV assume input in SI (temperature in K, fluxes in kg/m2/s). Forcing.generate() should create data that is in these units (unless it's model specific forcing, e.g. WflowForcing) to help make forcing interoperable between different models.

@Daafip
Copy link
Collaborator

Daafip commented Nov 18, 2024

I ended up leaving the units as is in the original dataset, only fixing the indexes in v3 of the opendap dataset: https://opendap.4tu.nl/thredds/catalog/data2/djht/ca13056c-c347-4a27-b320-930c2a4dd207/3/catalog.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants