Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bam to fastq one liner #13

Open
LukeBraidwood opened this issue Jun 15, 2015 · 2 comments
Open

bam to fastq one liner #13

LukeBraidwood opened this issue Jun 15, 2015 · 2 comments

Comments

@LukeBraidwood
Copy link

Hey,

Thanks very much for putting these explanations and tools up. I think the one liner you have put for converting bam to fastq is inappropriate (or should be described differently). The problem is that your awk prints fields 1, 10, and 11 in the bam.

Field 10 is called SEQ and represents the query sequence to which the read is aligned. However alignment sequences are always represented on the plus strand of the reference (http://chagall.med.cornell.edu/NGScourse/SAM.pdf, http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams this tool is inappropriate.

Thanks,

Luke

@stephenturner
Copy link
Owner

Thanks. Suggestion / pull request welcomed.

Stephen

Sent from mobile.

On Jun 15, 2015, at 10:26 AM, LukeBraidwood [email protected] wrote:

Hey,

Thanks very much for putting these explanations and tools up. I think the one liner you have put for converting bam to fastq is inappropriate (or should be described differently). The problem is that your awk prints fields 1, 10, and 11 in the bam.

Field 10 is called SEQ and represents the query sequence to which the read is aligned. However alignment sequences are always represented on the plus strand of the reference (http://chagall.med.cornell.edu/NGScourse/SAM.pdf, http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams this tool is inappropriate.

Thanks,

Luke


Reply to this email directly or view it on GitHub.

@LukeBraidwood
Copy link
Author

Dear Stephen,

Sorry for the slow reply, just remembered this exchange. I'm currently
using the samtofastq tool from picard tools, which has an option to
regenerate the RC of alignments to the negative strand:
http://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq

Cheers,

Luke

On Mon, Jun 15, 2015 at 4:36 PM, Stephen Turner [email protected]
wrote:

Thanks. Suggestion / pull request welcomed.

Stephen

Sent from mobile.

On Jun 15, 2015, at 10:26 AM, LukeBraidwood [email protected]
wrote:

Hey,

Thanks very much for putting these explanations and tools up. I think
the one liner you have put for converting bam to fastq is inappropriate (or
should be described differently). The problem is that your awk prints
fields 1, 10, and 11 in the bam.

Field 10 is called SEQ and represents the query sequence to which the
read is aligned. However alignment sequences are always represented on the
plus strand of the reference (
http://chagall.med.cornell.edu/NGScourse/SAM.pdf,
http://genome.sph.umich.edu/wiki/SAM), meaning that for stranded bams
this tool is inappropriate.

Thanks,

Luke


Reply to this email directly or view it on GitHub.


Reply to this email directly or view it on GitHub
#13 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants