You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see some inconsistencies between the files partition.pl vs partition_gmap.pl vs partition_gmap.py. I would like to know whether this is intended or whether it is an error.
next if(exists($rdb{$rname})); ### only retain single-end reads
The comment is not appropriate, as this will not keep single-end reads, I will only keep the first alignment of a given read.
The same line is present in partition.pl (b49ddea) line 51.
Nothing like that is present in partition_gmap.py (80ce6a9). This, I understand, would be the correct behavior, keeping all alignments from contigs originating from the same chromosome.
I have been using partition.pl with a sugarcane genome of 11Gb and a pruned bam of 250GB, this ran out of memory, on a machine with 0.5TB RAM. I have rewritten partition.pl to use a lot less memory, at least 20x less than your version. This version uses bioperl to index the assembled genome, and goes a single time through the streamed BAM file (without loading it in memory), available here. And another version that also streams the BAM file, both only for a set of contigs in a given chromosome
I will appreciate your comments on this.
Thanks a lot in advance.
Best,
Diego
The text was updated successfully, but these errors were encountered:
Only allow to assign a contig to one and only one chromosome from the reference. Each time the script is run, a different chromosome could be chosen. Perhaps a better way to chose would be to the one with the highest number of hits, something similar to what is done in partition_gmap.py in lines 53 - 57.
Any thoughts?
Thanks
Dear @tangerzhang and @tanghaibao,
I see some inconsistencies between the files partition.pl vs partition_gmap.pl vs partition_gmap.py. I would like to know whether this is intended or whether it is an error.
In partition_gmap.pl (d5bb1e5) line 65 reads
next if(exists($rdb{$rname})); ### only retain single-end reads
The comment is not appropriate, as this will not keep single-end reads, I will only keep the first alignment of a given read.
The same line is present in partition.pl (b49ddea) line 51.
Nothing like that is present in partition_gmap.py (80ce6a9). This, I understand, would be the correct behavior, keeping all alignments from contigs originating from the same chromosome.
I have been using partition.pl with a sugarcane genome of 11Gb and a pruned bam of 250GB, this ran out of memory, on a machine with 0.5TB RAM. I have rewritten partition.pl to use a lot less memory, at least 20x less than your version. This version uses bioperl to index the assembled genome, and goes a single time through the streamed BAM file (without loading it in memory), available here. And another version that also streams the BAM file, both only for a set of contigs in a given chromosome
I will appreciate your comments on this.
Thanks a lot in advance.
Best,
Diego
The text was updated successfully, but these errors were encountered: