Test Multimap check for rnaseq data

I tested 12 single species datasets (26 samples in total):

Human: 5 datasets, 10 samples in total
Mouse: 5 datasets, 10 samples in total
rat: 2 datasets, 6 samples in total.

I used two sargasso versions:

disableMultimap: 082de17c7e89a43c6056addb9968ad0657bae908
enableMultimap: 2fb2bca262083304e40704ad20115eee4bf8ec63

There are two main places where we check for multimap.

to check if a read exceed a pre-defined multimap threadhold.
to check which species is more likely to be the true origin.

For each of the sargasso version, I run four strategies on each sample:

best
conservative
recall
permissive

The results are shown in the following plots. The number in each cell is the difference between the enableMultimap/disableMultimap. A blue box(-) thus means the number is higher when multimap is disabled, whileas a red box(+) means the number is higher when enable multimap.

human

mouse

rat

The results are pretty consistency across species. Despite a small increase of misassigned reads(+thousands), it seems that by disable multimap check, we are getting a lot more(+Millions) corrected assigned reads.

The results suggest that for RNASEQ data, mutlimap is doming MORE harm, thus can be disable.

It can also be observed that the number of rejected reads are increased in ALL the species, including the true origin. This is probably due to the removal of the first multimap check listed above. Thus, It might be potentially interested to test the situation where we only disable check 2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Multimap check for rnaseq data

Clone this wiki locally