-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing cooler on capture Hi-C data #251
Comments
Hi @nservant,
Bin tables must start at 0 and should end at the chromosome length, even if the data is restricted to a smaller region. The first bin could simply be [0, 150125000) and the last [153125000, chrX_size). Let me know if that works. |
ok. I'll try and let you know if it works. |
I support the idea we should look more into this btw in general. More and more people do this kind of analysis. I've just been using regular genome-wide equal binning for this sort of data, this way you also get a view of the contacts that are only captured on one end, and they are not completely useless potentially. |
Hi @Phlya, I agree with you. The only point is that I frequently have some balancing issues using the contacts that are only captured on one end ... while focusing on the targeted region usually works well. Though I never really deeply investigate the reason. If you want to make additional tests, let me know, I'll be happy to exchange on that. |
You don't even need to use them for balancing, just storing them in the cooler so you can see them if you want... In my limited experience balancing small regions is not 100% reliable in general unfortunately, need to modify filtering sometimes. But would be good to assemble some test data to check how the tools work with it. |
One interesting feature which could help on that would simply be to have a tool which could extract a sub-matrix from a cooler object. Then you could imagine to generate a genome-wide object and restrict the downstream analysis such as TADs calling ... |
Hi, An update of this topic. It is now working with
Then, using the
N |
I've run into the same zoomify issue. |
I just wanted to say that I always use whole-genome binning, as if it was whole-genome Hi-C, and never had any issues like that. Just provide a blacklist to balancing, which would make it ignore most of the genome. |
Hi @nvictus
As quickly discussed, I tried to build a cooler object on a 3Mb Hi-C region from capture Hi-C data.
I'm facing several issues according to the test I run ...
I have a bed file with genomic intervals of 1kb, from
chrX:150125000-153125000
Accordinly, I extracted my pairs within the same genomic range ;
Then, I simply try to ingest the data with
cload pairs
Then, I had a try with
cooler pairx
The
csort
command works (although I put here the entire chrX size ? not sure what to put otherwise ...), but thecload pairix
also crashed ...Of note, I also reported the same error in
cooltools
earlier, when trying to bin a small genomeopen2c/cooltools#237
The text was updated successfully, but these errors were encountered: