Replies: 2 comments 1 reply
-
I think that would be my recommendation: A way to quickly crawl a static STAC catalog, following links up to some depth / some limit of items, and collecting the results in an ItemCollection. Then stackstac doesn't need to change at all. Edit: That said, you're still looking at one HTTP request per (sub)-catalog plus one HTTP request per item, which will add up even if you're making a bunch of requests concurrently. Would it be reasonable to do that "server-side", and collect the items into a static FeatureCollection, load that with pystac, and pass that to stackstac? |
Beta Was this translation helpful? Give feedback.
-
I've thought about this idea a bit and agree, it's either PySTAC or pystac-client. I'd like someway to be able to go both ways really easily....either crawl a catalog and convert to an ItemCollection, or read in an ItemCollection and explode that into a catalog. Certainly an API is ideal, but as @scottyhq points out, this can be useful for some data where there is a smaller number of STAC Items and not a big need to search (such as with modeled climate data stored in Zarr). There is certainly a limitation here in that it could be slow as 1 item=1 request, but async requests is on the roadmap for pystac-client which would help with that. A provider could also prepackage multiple Items in a single FeatureCollection if it were a set of Items that would be used together, but this wouldn't adhere to the static STAC spec. We actually talked about this at one point but there were too many problems. |
Beta Was this translation helpful? Give feedback.
-
As illustrated in the examples, stackstac works amazingly well for loading up the results of a pystac_client search in milliseconds! This is partly because the search results are stored as a single JSON FeatureCollection so it's just one blob of metadata to read. However, with published static STAC catalogs all the items must be discovered by crawling links, which ends up being slow.
Static STACs are nice because they are easy to create and can be consumed by tools like STAC browser whereas STAC APIs seem quite challenging to set up. So my question is, what's the best way for
stackstac
to open static catalogs? Perhaps a new tool or function in pystac (pystac.stac_io.AsyncStacIO()
?) or pystac_client that converts a static catalog to a single FeatureCollection very efficiently? Does such a thing already exist?Curious for ideas from @TomAugspurger @sharkinsspatial @matthewhanson @duckontheweb - Note there are Zarr parallels here to a static catalog having distributed metadata but the api results having "consolidated metadata", as discussed here #50 (comment).
Below is an example of current static catalog performance using https://github.com/relativeorbit/aws-rtc-12SYJ
Beta Was this translation helpful? Give feedback.
All reactions