Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing some issues with the comparison #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PoslavskySV
Copy link

Dear Colleagues,

I am one of the maintainers of MiXCR and read your preprint with a great interest. I have few modifications and suggestions to make the comparison of different software more fair.

Phred quality scores

Sequencing machines provide sequencing quality scores specifically to address the issue of sequencing errors. MiXCR, compared to many other tools, takes into account Phred quality scores provided by the sequencer. By default, it drops all the reads with low quality in CDR3 (thus with high probability of having a sequencing error), and the default threshold is 20. For fair comparison, one need to switch off this behaviour and allow MiXCR to use low quality reads as other tools do. This can be done with the following option:

mixcr assemble -ObadQualityThreshold=0 <other options> in.vdjca out.clns

This will significantly increase the outcome from MiXCR and make the comparison more fair.

Productive clones

Originally, you use --only-productive filter of clonotypes in MiXCR (drop all clones with stop codons and oof's), but the filtering used in/for other tools is different. I suggest to remove this filter in MiXCR for fair comparison or use the same filter across all tools.

Total read count comparison

Also, when comparing total read counts reported by different tools, you use different count metrics. The following issues introduce technical differences in the reported values:

  • for paired-end reads TRUST counts one mate-pair as two reads. MiXCR accounts one mate-pair (R1+R2) as a single read;
  • MiXCR does not include non-CDR3 containing reads into counts, while they are still used for contig assembly;
  • by default MiXCR does not add counts of clusterized clonotypes (this can be changed with -OaddReadsCountOnClustering=true option on the assemble step).

All in all, I believe this metric can't be used for comparison, or requires thorough normalization procedures to bring all the numbers to the same scale. A simpler and more practical metric would be the number of reported true CDR3 clonotypes.

False positive clones

When comparing clonotypes one need to distinguish between true clones and false positives. In both MiXCR and TRUST you can export raw reads that were used to assemble clonotypes and manually inspect which clones are real and which are obvious false-positive calls.

For example, one of the top clonotypes reported by TRUST from PRJNA812076 sample is:

CDR3 nt: TGTGCGAACACCGGGGAGCTGTTTTTT

CDR3 aa: CANTGELFF

This is a false-positive (spurious) clonotype coming from genomic sequences (it is easy to check it by BLASTing raw reads reported by the tool for the clonotype). It is among the top clonotypes, and what is really alarming is that it is reproduced across different samples, which may lead to wrong biological conclusions.

blast

So, right now this clone with high read count is accounted as "advantage" in TRUST while, obviously, this is a great disadvantage of the software, as it may lead you to wrong biological conclusions.

Summarizing, for fair comparison I suggest to use just the number of clones and to carefully distinguish between true and false clonotypes reported by the software, accounting the number of "true" clonotypes as "plus" and "false" clonotypes as "minus".

Will be happy to discuss these suggestions further.

All the above applies to MiXCR 3.0.13, which was the latest version when the preprint was published. If you plan to use MiXCR in other studies for analysis of RNA-Seq or other types of data, I strongly recommend using the latest MiXCR version which is 4.2.0 (at the moment this is written), we continue to put a lot of work in optimizing the algorithms and tuning the parameters, aside from the new feature development.

- turn off quality filtering
- remove productive clonotype filtering
- add read counts on clustering
@PoslavskySV
Copy link
Author

PoslavskySV commented Feb 20, 2023

@KeruiP @smangul1 did you have a chance to look over my PR and comments? I'm really looking forward to hear your thoughts. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant