junction score with paired-end data - is it # of reads or fragments? #170

nmanik · 2013-04-16T19:40:37Z

Hi Mike,
I hope you're doing well. I'm back again after a hiatus, with a simple question this time. From user guide/wiki, I see that score in junctions_high-quality.bed "is the number of uniquely mapping reads crossing the junction with at least 8 bases on each side."

If I've paired-end data, is this score the number of reads or fragments crossing the junction? I would like to get the count of fragments, as it avoids overcounting when fragments are short (and hence both left and right mate reads cross the junction).

If RUM only reports reads, do you have plans for reporting fragments or any recommendations on how I can get fragment count (possibly from some other output file or which code should I look into)?

Thanks,
Mani

mdelaurentis · 2013-04-17T14:30:50Z

Mani,

I believe the counts are based on fragments, not reads. By the time we
produce the junctions files, we have already merged overlapping paired
reads. So if the forward and reverse read for the same fragment both span a
junction, they will overlap each other, and so will have been merged
together anyway and will only be counted once.

I'm copying Greg for confirmation, as he is more familiar with this part of
RUM than I am.

Thanks,

Mike

On Tue, Apr 16, 2013 at 3:40 PM, nmanik [email protected] wrote:

Hi Mike,
I hope you're doing well. I'm back again after a hiatus, with a simple
question this time. From user guide/wiki, I see that score in
junctions_high-quality.bed "is the number of uniquely mapping reads
crossing the junction with at least 8 bases on each side."

If I've paired-end data, is this score the number of reads or fragments
crossing the junction? I would like to get the count of fragments, as it
avoids overcounting fragments are short (and hence both left and right mate
reads cross the junction).

If RUM only reports reads, do you have plans for reporting fragments or
any recommendations on how I can get fragment count (possibly from some
other output file or which code should I look into)?

Thanks,
Mani

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/170
.

mdelaurentis · 2013-04-17T15:18:08Z

That's correct, they are FPKM's. Mike we should change this on our
documentation. Thanks, Greg

On Wed, 17 Apr 2013, Mike DeLaurentis wrote:

Mani,

I believe the counts are based on fragments, not reads. By the time we
produce the junctions files, we have already merged overlapping paired
reads. So if the forward and reverse read for the same fragment both span a
junction, they will overlap each other, and so will have been merged
together anyway and will only be counted once.

I'm copying Greg for confirmation, as he is more familiar with this part of
RUM than I am.

Thanks,

Mike

On Tue, Apr 16, 2013 at 3:40 PM, nmanik [email protected] wrote:

Hi Mike,
I hope you're doing well. I'm back again after a hiatus, with a simple
question this time. From user guide/wiki, I see that score in
junctions_high-quality.bed "is the number of uniquely mapping reads
crossing the junction with at least 8 bases on each side."

If I've paired-end data, is this score the number of reads or fragments
crossing the junction? I would like to get the count of fragments, as it
avoids overcounting fragments are short (and hence both left and right mate
reads cross the junction).

If RUM only reports reads, do you have plans for reporting fragments or
any recommendations on how I can get fragment count (possibly from some
other output file or which code should I look into)?

Thanks,
Mani

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/170
.

nmanik · 2013-04-17T20:59:42Z

Thanks Mike & Greg for clarifying this!

One more clarification - I think Greg meant fragment counts and not FPKM (as FPKM would mean fragment count normalized by total number of reads and length of covered region -- I don't think the score in the junctions-high-quality.bed files involve any normalization).

greggrant · 2013-04-17T21:10:21Z

Right, the raw counts are fragment counts.

On Wed, 17 Apr 2013, nmanik wrote:

Thanks Mike & Greg for clarifying this!

One more clarification - I think Greg meant fragment counts and not FPKM (as FPKM would mean fragment count normalized by total number of reads and length of covered region -- I don't think the score in the junctions-high-quality.bed files involve any normalization).

Reply to this email directly or view it on GitHub:
#170 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

junction score with paired-end data - is it # of reads or fragments? #170

junction score with paired-end data - is it # of reads or fragments? #170

nmanik commented Apr 16, 2013

mdelaurentis commented Apr 17, 2013

mdelaurentis commented Apr 17, 2013

nmanik commented Apr 17, 2013

greggrant commented Apr 17, 2013

junction score with paired-end data - is it # of reads or fragments? #170

junction score with paired-end data - is it # of reads or fragments? #170

Comments

nmanik commented Apr 16, 2013

mdelaurentis commented Apr 17, 2013

mdelaurentis commented Apr 17, 2013

nmanik commented Apr 17, 2013

greggrant commented Apr 17, 2013