-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xr.doctor(): diagnostics on a Dataset / DataArray ? #6308
Comments
Very much agree with the goal! I wonder whether there's a broader approach with something like Either way, I'm a +1 |
Just found this issue! I agree that this would be helpful. But isn't it fundamentally a Dask issue? Vanilla Xarray + Numpy has none of these problems because everything is in memory. |
This is my understanding of xarray. Or is there a way that a xarray variable points to a dask structure?
Dask has already some |
The kind of data wrapped in an Xarray Dataset (e.g., a Numpy array, a Dask array or any other array #5648) is already something useful that From my experience of introducing Xarray to new users, they often completely ignore what is under the hood until something or someone makes them aware, likely after they experience some weird behavior or performance issue that is hard to figure out by themselves. Xarray objects are flexible container wrappers connected to a wide range of other Python libraries, such that it is hard to give a short introduction that covers all the important aspects (lazy / non-lazy, chunked / non-chunked, etc.). For example, it may be possible that someone who has never heard of Dask nor Zarr follows an Xarray tutorial that starts by opening a chunked dataset from a zarr store. In this case the rich repr of the Xarray Dataset doesn't even help. Rather than a performance report or a profiling tool, the proposal here (still very elusive) is to provide a helper function that returns some information and explanation in plain english (why not with some hyperlinks, pretty printing, etc.) that would help users making sense of an Xarray object and its wrapped data/metadata. Some kind of interactive documentation very specific to the actual Xarray object. Some kind of smart tool that would partially "replace" custom (though very basic) user support. |
Is your feature request related to a problem?
Recently I've been reading through various issue reports here and there (GH issues and discussions, forums, etc.) and I'm wondering if it wouldn't be useful to have some function in Xarray that inspects a Dataset or DataArray and reports a bunch of diagnostics, so that the community could better help troubleshooting performance or other issues faced by users.
It's not always obvious where to look (e.g., number of chunks of a dask array, number of tasks of a dask graph, etc.) to diagnose issues, sometimes even for experienced users.
Describe the solution you'd like
A
xr.doctor(dataset_or_dataarray)
top-level function (orDataset.doctor()
/DataArray.doctor()
methods) that would perform a battery of checks and return helpful diagnostics, e.g.,Describe alternatives you've considered
None
Additional context
No response
The text was updated successfully, but these errors were encountered: