-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading raw
data and differences between Python and R reader
#67
Comments
Hi @PeteHaitch I will try to answer your questions but definitely happy to discuss further if needed.
Yeah we can't assume the dimensions match. The The use case for
Yeah the R reader is definitely underdeveloped (hoping to get some funding to work on that 🤞🏻). When it was added it was roughly equivalent but I have done a fair bit of work on the Python side which hasn't been carried over. That could definitely be better documented though.
Do you know what version of Python anndata was used to write the file? I think how layers are stored might be one of the things that was changed in I don't know a lot about the internals of HDF5 files TBH. If there is a case that |
That looks like it's from the 0.7.x release series at least. You can see the docs for that format under the 0.7.8 docs (which I just made publicly visible): https://anndata.readthedocs.io/en/0.7.8/fileformat-prose.html |
@ivirshup Any input on the layers question? ☝🏻 |
Following this topic, i have another questionas well. My task is to convert anndata into sce. Image i have 20k genes in anndata, and 2k were slected as high varibale genes, When i convert adata into sce directly, i have 2k genes with both raw count in assay 'counts' and normalised count in assay 'X'. iF I would like to get the 20k genes, i use adata.raw.to_adata, to get them and save it as another anndata. However, when it is converted to sce, only normalised count are found but not raw count. What shall i do here to get both counts and normalised counts for the 20k genes? |
@amoyguang1 Please open a separate issue for this |
Thanks for making it possible to read
h5ad
files into R.I had a few issues/questions after trying to get the raw count matrix from a public
h5ad
file.Some of these points were touched on in #57 and #63 but I hoped to re-visit and clarify some things.
Apologies if these questions are naive or misguided; I'm not very familiar with AnnData format and the structure of the particular
h5ad
file I was working with doesn't seem to match with that described in https://anndata.readthedocs.io/en/latest/fileformat-prose.html.The reprex demonstrates the points but I've summarised them below:
raw
stuck in as an altExp rather than an assay when usingreadH5AD()
? For the example below, theraw
data has the same dimensions as theX
data and is essentially thecounts()
data I'd expect to find in a SingleCellExperiment. Ahh, the figure in?AnnData2SCE
suggestsraw
may not have the same dimensions asX
because the latter may have undergone filtering, so I guess this precludesreadH5AD()
being able to assumeraw
is analogous tocounts(sce)
?raw
. I know it's documented that the R reader is experimental and may produce different results to the Python reader, so I'm guessing all the...
arguments toreadH5AD()
don't work with the R reader? If that's the case, then adding some documentation and/or a warning/error if a user tries to go down this path would be helpful (happy to add this if my understanding is correct).h5ad
files a bit loose in the wild? The AnnData documentation refers to thelayers
element as being standard (https://anndata.readthedocs.io/en/latest/fileformat-prose.html#mappings) but this particular file doesn't have it. It means I couldn't useHDF5Array::H5ADMatrix()
to explore theraw
data because it expects/requires it in/layers/raw
(this was raised by @LTLA in Read alternative data with AnnData2SCE #57 (comment)). Any sense of whether it's worth trying to modifyHDF5Array::H5ADMatrix()
to account for the potential lack of/layers
group in a.h5ad
file?Thanks,
Pete
Created on 2022-06-17 by the reprex package (v2.0.1)
Session info
The text was updated successfully, but these errors were encountered: