Digital Earth Pacific applications #140

brunosan · 2024-01-29T09:37:38Z

brunosan
Jan 29, 2024
Maintainer

@alexgleith let's brainstorm here how Clay can help DEP.

In a nutshell, we seek to learn to predict labels from a dataset of ~1k examples you give us, that can be expected to be seen on Sentinel-2 or Sentinel-1 data.

alexgleith · 2024-01-30T01:01:25Z

alexgleith
Jan 30, 2024

Hey @brunosan

Thanks for the chat yesterday!

This is a nice dataset that would be a good start: https://github.com/nick-murray/coastTrain

There's around 8,000 points across Fiji as a starting point. We have S-1 and S-2 annual mosaics, which we can use to do prediction on, see the S-1 Mosaic and S-2 GeoMAD here: https://stac-browser.staging.digitalearthpacific.org/

We have some unpolished notebooks doing random forest classification now, but it would be great to see if we can use your model to compare with a simple random forest model, for example.

5 replies

weiji14 Feb 12, 2024
Maintainer

Hi Alex, nice seeing you here! That STAC collection looks neat, do you know how the mosaics were processed (or can point to some code/documentation on the processing pipeline)? Specifically:

Sentinel-1 - Is this processed from the Planetary Computer's RTC product, or from GRD? I'd be interested to know how the annual VV/VH mosaic works in terms of handling speckle noise in SAR images.
Sentinel-2 - Is this processed from the L2A or L1C product?

Cc @yellowcap who had some ideas about running the Clay model on mosaic imagery. Using this STAC collection might be a lot easier than pulling from other sources.

Also, just curious about what your random forest model is trying to predict? The team is developing some notebooks for downstream tasks using the Clay model right now (e.g. #149 and #154), and it would be good to see if we can collaborate on a tangible real world use-case!

alexgleith Feb 12, 2024

Hey @weiji14!

The S-1 mosaic is quite simple, and code is here: https://github.com/digitalearthpacific/dep-s1-mosaic/. I'm going to change it a bit, adapting to use the Geomedian rather than regular median. It's processed from the MSPC RTC, yes. I started a Twitter thread, here, to ask questions, and Ian Woodhouse suggests that speckle isn't noise, it's signal, and that a mean is better than a median. Anyhow, it's a WIP, but the data can be accessed here: https://stac.staging.digitalearthpacific.org/collections/dep_s1_mosaic

On the S-2 mosaic, this process is more complex, but also more well established. It's based on the MSPC Sentinel-2, but uses the Digital Earth Australia Geometric Median and Absolute Deviations algorithm, so it's a n-dimensional median, with measures of variance. See here for deeper docs: https://docs.digitalearthafrica.org/en/latest/data_specs/GeoMAD_specs.html. Code is here and preliminary data here: https://stac.staging.digitalearthpacific.org/collections/dep_s2_geomad.

The random forest we're running right now is looking to predict areas of gravel extraction, but later this year we want to predict locations of mangroves, seagrass and other blue carbon-relevant regions.

My colleague is cleaning up the gravel extraction codebase now, and maybe we can have a look at that use-case with the clay model. It would be great to sit down with someone for an hour to understand how to train and predict with it. I have the data wrangling all sorted, and we're using scipy randomforest classifier now, which is super easy.

weiji14 Feb 12, 2024
Maintainer

Cool, thanks for all the links and explanations! Glad to see that you've engaged with a SAR expert too, but I'm also wondering why someone would need an annual mean if the SAR VV/VH signal isn't affected by clouds anyway 🤔

Since we're the closest timezone-wise, I'm happy to have a chat for an hour this week (let me email/Slack you separately to set up a time). If you could send an AOI (e.g. bounding box coordinates, or a GeoJSON file) and time range, I can start taking a look myself and set things up beforehand.

alexgleith Feb 12, 2024

but I'm also wondering why someone would need an annual mean if the SAR VV/VH signal isn't affected by clouds anyway

It's because the computation was taking too much memory on the fly, so we're storing it. There were around 40 timesteps over Fiji, so doing that on the fly for training and predicting was taking too long.

alexgleith Feb 12, 2024

The bbox for Viti Levu (the Fiji main island) is [177.2, -18.4, 178.9, -17.2].

This training data is a good starting point https://github.com/digitalearthpacific/mineral-resource-detection/blob/main/training_data/draft_inputs/MRD_dissagregated_25.geojson

alexgleith · 2024-02-13T23:09:43Z

alexgleith
Feb 13, 2024

Great to chat today, @weiji14

From the Digital Earth Pacific side, we have Sentinel-2 and Sentinel-1 mosaics over all of Fiji, which we are intending on using to do some land-use/land-cover type ML. We're currently doing old-school classification, and we are open to using new methods based on the Clay foundation model.

We agreed that we (DEP) will wait until there are some more well established examples, and you said you might be able to help us there. We agreed to work together in the future on use-cases, once there is a process that our team can use as an example and start building on.

3 replies

weiji14 Feb 13, 2024
Maintainer

Thanks @alexgleith and @nicholasmetherall for the call just now. I'm sure we'll have lots to collaborate on in the coming months on DEP 😄

From the Digital Earth Pacific side, we have Sentinel-2 and Sentinel-1 mosaics over all of Fiji, which we are intending on using to do some land-use/land-cover type ML. We're currently doing old-school classification, and we are open to using new methods based on the Clay foundation model.

Sounds good. I think the cool part would be to use the Clay model to do classification/segmentation tasks on the STAC-hosted mosaics, for downstream tasks like the mineral resource detection, or potentially a South Pacific-specific Land Use Land Cover map as you mentioned. @yellowcap has been keen to create embeddings on composites/mosaics (see #128), and having a working STAC API would definitely simplify things!

We agreed that we (DEP) will wait until there are some more well established examples, and you said you might be able to help us there. We agreed to work together in the future on use-cases, once there is a process that our team can use as an example and start building on.

Yep, we (DevSeed) can handle the ML-engineering part to get an initial finetuning workflow set-up, and that would involve working with y'all at DEP to get the GeoMAD/mosaic data in a good state. I'll poke around some more and probably ask lots of questions along the way!

alexgleith Feb 14, 2024

You're awesome, @weiji14! Love your work :-)

weiji14 Mar 4, 2024
Maintainer

Sorry for the late reply (was travelling a bit last month, and just getting back into the loop), I've started a notebook at #171 that currently just does the STAC API query from the DEP STAC, and have chatted with @lillythomas and @srmsoumya about what downstream use-cases we can target (mineral extraction, LULC, or something else). More on this soon hopefully!

Edit: @lillythomas is starting work on the mineral exploration use-case at #172

nicholasmetherall · 2024-03-08T08:57:55Z

nicholasmetherall
Mar 8, 2024

Thanks Wei Ji. We are currently looking into different species of forest (including mangroves, pine, primary and secondary forest) in Fiji and Tonga and it would be great to understand how the Clay Model + segmentation capabilities could be applied to this too. It looks like the current work in the repo you shared will already be able to shed some preliminary light onto this. Best wishes.

…

________________________________ From: Wei Ji ***@***.***> Sent: 14 February 2024 10:33 To: Clay-foundation/model ***@***.***> Cc: Nick Metherall ***@***.***>; Mention ***@***.***> Subject: Re: [Clay-foundation/model] Digital Earth Pacific applications (Discussion #140) You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification> Thanks @alexgleith<https://github.com/alexgleith> and @nicholasmetherall<https://github.com/nicholasmetherall> for the call just now. I'm sure we'll have lots to collaborate on in the coming months on DEP 😄 From the Digital Earth Pacific side, we have Sentinel-2 and Sentinel-1 mosaics over all of Fiji, which we are intending on using to do some land-use/land-cover type ML. We're currently doing old-school classification, and we are open to using new methods based on the Clay foundation model. Sounds good. I think the cool part would be to use the Clay model to do classification/segmentation tasks on the STAC-hosted mosaics, for downstream tasks like the mineral resource detection<https://github.com/digitalearthpacific/mineral-resource-detection>, or potentially a South Pacific-specific Land Use Land Cover map as you mentioned. @yellowcap<https://github.com/yellowcap> has been keen to create embeddings on composites/mosaics (see #128<#128>), and having a working STAC API would definitely simplify things! We agreed that we (DEP) will wait until there are some more well established examples, and you said you might be able to help us there. We agreed to work together in the future on use-cases, once there is a process that our team can use as an example and start building on. Yep, we (DevSeed) can handle the ML-engineering part to get an initial finetuning workflow set-up, and that would involve working with y'all at DEP to get the GeoMAD/mosaic data in a good state. I'll poke around some more and probably ask lots of questions along the way! — Reply to this email directly, view it on GitHub<#140 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQB2HOZS6WUGCMTEW36CLZTYTPZ4XAVCNFSM6AAAAABCPDCADCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DINJZG43DI>. You are receiving this because you were mentioned.

0 replies

lillythomas · 2024-04-04T20:54:05Z

lillythomas
Apr 4, 2024
Maintainer

Chiming in here! Nice to meet you @alexgleith. A bit of a delayed update but we made some progress on this front. A couple of weeks ago we pushed a notebook that explores use of embeddings derived from the Sentinel-2 composite to see if we could find new "quarry" sites. Essentially, we are mapping filtered ground truth points to patch level embeddings to produce reference embeddings that can be used in a similarity search query. It's a bit hard to validate the results without greater understanding of how these quarries actually represent in a spatial and spectral sense. Stated otherwise, right now the embeddings are derived from 10 meter RGB-Nir inputs. Is that sufficient to capture a signal on quarries in this region? Would other channels be useful? Maybe Sentinel-1 as well?

The notebook lives here: https://github.com/Clay-foundation/model/blob/main/docs/tutorial_digital_earth_pacific_patch_level.ipynb

I'd love to discuss this further. Let me know if that is of interest!

0 replies

alexgleith · 2024-04-04T21:50:38Z

alexgleith
Apr 4, 2024

Hi @lillythomas, thanks for sharing!

I've had a read through of the notebook and my quick take is it's pretty complicated. There's a lot of writing to disk and storing values in databases, which is surprising to me I guess.

I don't know if RGB+NIR is enough to identify the sites. In our work using random forest, I think elevation was ranked as important.

I'll see if I can run the notebook today and extract some point locations and compare with our results and come back to you.

0 replies

alexgleith · 2024-04-04T22:22:36Z

alexgleith
Apr 4, 2024

Ok, after checking, these bands were important in our current process (descending order of importance): Elevation, B12, B11, B02, B08, B04 and then some of the indices like BSI and EVI are good, but I don't think you need to calculate them in your work.

The Sentinel-1 mosaic bands (mean_vv, mean_vh, mean_vv_over_vh) were not at all important!

0 replies

alexgleith · 2024-04-04T22:58:33Z

alexgleith
Apr 4, 2024

I plotted the final image at higher-resolution. So from what I understand, these 10 regions are the "most similar" to the training points.

The below is those regions visualised over our model output using random forest (this is available as the collection dep_s2s1_mrd on our STAC API now). We're particularly interested in the areas in rivers, like the lower middle. The blue squares from the Clay model don't appear to be highlighting any of the red pixels (quarry class) from our model.

Again, looking at the matched squares with my human eyes over the GeoMAD mosaic, I'm not sure what is making these ones stand out.

1 reply

brunosan Apr 5, 2024
Maintainer Author

fwiw we are tracking on #186 this issue. I believe this is a polysemantic issue. That is, the embedding contains the semantics of everyhing within it, so a similarity search finds similar across all semantics. The intended semantic, quarry, might a small subset of dimension combinations within the embedding, so we must filter out unwanted semantics before doing similarity. (I'm calling this "polysemantic pruning").

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Digital Earth Pacific applications #140

{{title}}

Replies: 7 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Digital Earth Pacific applications #140

brunosan Jan 29, 2024 Maintainer

Replies: 7 comments · 9 replies

weiji14 Feb 12, 2024 Maintainer

weiji14 Feb 12, 2024 Maintainer

weiji14 Feb 13, 2024 Maintainer

weiji14 Mar 4, 2024 Maintainer

lillythomas Apr 4, 2024 Maintainer

brunosan Apr 5, 2024 Maintainer Author

brunosan
Jan 29, 2024
Maintainer

Replies: 7 comments 9 replies

weiji14 Feb 12, 2024
Maintainer

weiji14 Feb 12, 2024
Maintainer

weiji14 Feb 13, 2024
Maintainer

weiji14 Mar 4, 2024
Maintainer

lillythomas
Apr 4, 2024
Maintainer

brunosan Apr 5, 2024
Maintainer Author