Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xr.doctor(): diagnostics on a Dataset / DataArray ? #6308

Open
benbovy opened this issue Feb 26, 2022 · 4 comments
Open

xr.doctor(): diagnostics on a Dataset / DataArray ? #6308

benbovy opened this issue Feb 26, 2022 · 4 comments

Comments

@benbovy
Copy link
Member

benbovy commented Feb 26, 2022

Is your feature request related to a problem?

Recently I've been reading through various issue reports here and there (GH issues and discussions, forums, etc.) and I'm wondering if it wouldn't be useful to have some function in Xarray that inspects a Dataset or DataArray and reports a bunch of diagnostics, so that the community could better help troubleshooting performance or other issues faced by users.

It's not always obvious where to look (e.g., number of chunks of a dask array, number of tasks of a dask graph, etc.) to diagnose issues, sometimes even for experienced users.

Describe the solution you'd like

A xr.doctor(dataset_or_dataarray) top-level function (or Dataset.doctor() / DataArray.doctor() methods) that would perform a battery of checks and return helpful diagnostics, e.g.,

  • "Data variable "x" wraps a dask array that contains a lot of tasks, which may affect performance"
  • "Data variable "x" wraps a dask array that contains many small chunks"
  • ... possibly many other diagnostics?

Describe alternatives you've considered

None

Additional context

No response

@max-sixty
Copy link
Collaborator

Very much agree with the goal!

I wonder whether there's a broader approach with something like xr.describe — i.e. give lots of useful info about the metadata of the array, including any warnings. It's not that performance sensitive, so it would be fine to throw lots of things in there.

Either way, I'm a +1

@rabernat
Copy link
Contributor

rabernat commented Nov 2, 2022

Just found this issue! I agree that this would be helpful. But isn't it fundamentally a Dask issue? Vanilla Xarray + Numpy has none of these problems because everything is in memory.

@echarles
Copy link

echarles commented Nov 3, 2022

Vanilla Xarray + Numpy has none of these problems because everything is in memory.

This is my understanding of xarray. Or is there a way that a xarray variable points to a dask structure?

But isn't it fundamentally a Dask issue?

Dask has already some performance_report capabilities documented on https://docs.dask.org/en/stable/diagnostics-distributed.html#capture-diagnostics. Anything missing out there?

@benbovy
Copy link
Member Author

benbovy commented Nov 7, 2022

The kind of data wrapped in an Xarray Dataset (e.g., a Numpy array, a Dask array or any other array #5648) is already something useful that xr.doctor or xr.describe may tell!

From my experience of introducing Xarray to new users, they often completely ignore what is under the hood until something or someone makes them aware, likely after they experience some weird behavior or performance issue that is hard to figure out by themselves. Xarray objects are flexible container wrappers connected to a wide range of other Python libraries, such that it is hard to give a short introduction that covers all the important aspects (lazy / non-lazy, chunked / non-chunked, etc.). For example, it may be possible that someone who has never heard of Dask nor Zarr follows an Xarray tutorial that starts by opening a chunked dataset from a zarr store. In this case the rich repr of the Xarray Dataset doesn't even help.

Rather than a performance report or a profiling tool, the proposal here (still very elusive) is to provide a helper function that returns some information and explanation in plain english (why not with some hyperlinks, pretty printing, etc.) that would help users making sense of an Xarray object and its wrapped data/metadata. Some kind of interactive documentation very specific to the actual Xarray object. Some kind of smart tool that would partially "replace" custom (though very basic) user support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants