-
Notifications
You must be signed in to change notification settings - Fork 4
Test Multimap check for rnaseq data
I tested 12 single species datasets (26 samples in total):
- Human: 5 datasets, 10 samples in total
- Mouse: 5 datasets, 10 samples in total
- rat: 2 datasets, 6 samples in total.
I used two sargasso versions:
- disableMultimap: 082de17c7e89a43c6056addb9968ad0657bae908
- enableMultimap: 2fb2bca262083304e40704ad20115eee4bf8ec63
There are two main places where we check for multimap.
- to check if a read exceed a pre-defined multimap threadhold.
- to check which species is more likely to be the true origin.
For each of the sargasso version, I run four strategies on each sample:
- best
- conservative
- recall
- permissive
The results are shown in the following plots. The number in each cell is the difference between the enableMultimap/disableMultimap. A blue box(-) thus means the number is higher when multimap is disabled, whileas a red box(+) means the number is higher when enable multimap.
human
mouse
rat
The results are pretty consistency across species. Despite a small increase of misassigned reads(+thousands), it seems that by disable multimap check, we are getting a lot more(+Millions) corrected assigned reads.
The results suggest that for RNASEQ data, mutlimap is doming MORE harm, thus can be disable.
It can also be observed that the number of rejected reads are increased in ALL the species, including the true origin. This is probably due to the removal of the first multimap check listed above. Thus, It might be potentially interested to test the situation where we only disable check 2.