Addressing some issues with the comparison #1

PoslavskySV · 2023-02-01T20:17:28Z

Dear Colleagues,

I am one of the maintainers of MiXCR and read your preprint with a great interest. I have few modifications and suggestions to make the comparison of different software more fair.

Phred quality scores

Sequencing machines provide sequencing quality scores specifically to address the issue of sequencing errors. MiXCR, compared to many other tools, takes into account Phred quality scores provided by the sequencer. By default, it drops all the reads with low quality in CDR3 (thus with high probability of having a sequencing error), and the default threshold is 20. For fair comparison, one need to switch off this behaviour and allow MiXCR to use low quality reads as other tools do. This can be done with the following option:

mixcr assemble -ObadQualityThreshold=0 <other options> in.vdjca out.clns

This will significantly increase the outcome from MiXCR and make the comparison more fair.

Productive clones

Originally, you use --only-productive filter of clonotypes in MiXCR (drop all clones with stop codons and oof's), but the filtering used in/for other tools is different. I suggest to remove this filter in MiXCR for fair comparison or use the same filter across all tools.

Total read count comparison

Also, when comparing total read counts reported by different tools, you use different count metrics. The following issues introduce technical differences in the reported values:

for paired-end reads TRUST counts one mate-pair as two reads. MiXCR accounts one mate-pair (R1+R2) as a single read;
MiXCR does not include non-CDR3 containing reads into counts, while they are still used for contig assembly;
by default MiXCR does not add counts of clusterized clonotypes (this can be changed with -OaddReadsCountOnClustering=true option on the assemble step).

All in all, I believe this metric can't be used for comparison, or requires thorough normalization procedures to bring all the numbers to the same scale. A simpler and more practical metric would be the number of reported true CDR3 clonotypes.

False positive clones

When comparing clonotypes one need to distinguish between true clones and false positives. In both MiXCR and TRUST you can export raw reads that were used to assemble clonotypes and manually inspect which clones are real and which are obvious false-positive calls.

For example, one of the top clonotypes reported by TRUST from PRJNA812076 sample is:

CDR3 nt: TGTGCGAACACCGGGGAGCTGTTTTTT

CDR3 aa: CANTGELFF

This is a false-positive (spurious) clonotype coming from genomic sequences (it is easy to check it by BLASTing raw reads reported by the tool for the clonotype). It is among the top clonotypes, and what is really alarming is that it is reproduced across different samples, which may lead to wrong biological conclusions.

So, right now this clone with high read count is accounted as "advantage" in TRUST while, obviously, this is a great disadvantage of the software, as it may lead you to wrong biological conclusions.

Summarizing, for fair comparison I suggest to use just the number of clones and to carefully distinguish between true and false clonotypes reported by the software, accounting the number of "true" clonotypes as "plus" and "false" clonotypes as "minus".

Will be happy to discuss these suggestions further.

All the above applies to MiXCR 3.0.13, which was the latest version when the preprint was published. If you plan to use MiXCR in other studies for analysis of RNA-Seq or other types of data, I strongly recommend using the latest MiXCR version which is 4.2.0 (at the moment this is written), we continue to put a lot of work in optimizing the algorithms and tuning the parameters, aside from the new feature development.

- turn off quality filtering - remove productive clonotype filtering - add read counts on clustering

PoslavskySV · 2023-02-20T11:17:52Z

@KeruiP @smangul1 did you have a chance to look over my PR and comments? I'm really looking forward to hear your thoughts. Thank you!

Update MIXCR.sh

6c67398

- turn off quality filtering - remove productive clonotype filtering - add read counts on clustering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing some issues with the comparison #1

Addressing some issues with the comparison #1

PoslavskySV commented Feb 1, 2023

PoslavskySV commented Feb 20, 2023 •

edited

Loading

Addressing some issues with the comparison #1

Are you sure you want to change the base?

Addressing some issues with the comparison #1

Conversation

PoslavskySV commented Feb 1, 2023

Phred quality scores

Productive clones

Total read count comparison

False positive clones

PoslavskySV commented Feb 20, 2023 • edited Loading

PoslavskySV commented Feb 20, 2023 •

edited

Loading