Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE REQUEST: Extract/Calls reports number of CpG's covered in the sample #332

Open
ethan-mcq opened this issue Jan 6, 2025 · 3 comments

Comments

@ethan-mcq
Copy link

FEATURE REQUEST: Extract/Calls reports number of CpG's covered in the sample

It would be awesome if one was using a motif file to have the percent of CpG's covered in the sample reported after creating the extract and calls files and/or the number of unique CpG locations that are present in the extract/calls file after performing analysis.

@ArtRand
Copy link
Contributor

ArtRand commented Jan 6, 2025

Hello @ethan-mcq,

Something like that is possible. For a CpG to be "covered" would you mean that the CpG has at least 1 passing base modification call? This feature feels more like something that would be part of modkit summary. I will have a think about it. What is the use case? You can get this information from pileup as well correct?

@ethan-mcq
Copy link
Author

ethan-mcq commented Jan 8, 2025

Hey @ArtRand!

I think that at least 1 passing base modification call, canonical or methylated, would be considered "covered".

Generally, we are using a CpG motif to speed up the extract/calls function and often we are dealing with large Gb amounts of throughput per sample. We are not often running pileup as it doesn't give us as much information as the extract function does, as we prefer the single-molecule, "wider" format. Something like the samtools coverage function or the likes might be the most versatile option for this. The use case is that we are trying to have an idea of X coverage of CpG's in samples, even down to the single CpG level. Starting with an average % coverage is a great start and I think could be reported easily in the modkit pipeline flow.

While this would of course be up to the final decision of the developers, I think it would be most helpful to add in a metadata statistic into functions that use a motif file that report the total number of CpG's or other motifs included that are covered by at least 1 passing base modification call, and/or a percent coverage. The summary function could be argued, but includes additional repeat computation as summary defaults to a subset of reads to produce the summary file.

Hope this makes sense!

@ArtRand
Copy link
Contributor

ArtRand commented Jan 9, 2025

Hello @ethan-mcq,

I see. Perhaps a very lightweight version of pileup that can be run quickly would help. I can look into performing this tabulation during extract, I'd just want to be sure it doesn't add any overhead. I'll keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants