minimap error #1

aheravi · 2021-01-06T21:49:20Z

Hi,
I am trying to use ampBinner_10X.py on my fastq and getting errors on the alignment step. Could you please comment on that?

Thanks!

Running code:

ampBinner_10X.py --in_fq P.fastq.gz --barcode_list 3M_Feb_737_Apr_Aug.txt --barcode_upstream_seq AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --out_prefix D8 --num_threads 24 --minimap2 /linux-x86_64-centos7/minimap2-2.15/minimap2 >run.log 2>&1 &

Error:

sh: line 1: 51263 Killed                  //minimap2-2.15/minimap2 -t 1 --for-only --eqx -c --cs -N 200 -k 5 -w 3 -n 1 -m 10 -s 40 -A 4 -x map-ont /D8.tmp.thread16.barcode_with_anchor.fasta /D8.tmp.thread16.left_tail_barcode_candidate.fastq > /D8.tmp.thread16.left_tail_barcode_compare.paf 2> /dev/null
[01/06/2021 12:48:49] ERROR: Failed to run command: /minimap2-2.15/minimap2  -t 1 --for-only --eqx -c --cs -N 200 -k 5 -w 3 -n 1 -m 10 -s 40 -A 4 -x map-ont  /D8.tmp.thread16.barcode_with_anchor.fasta /D8.tmp.thread16.left_tail_barcode_candidate.fastq > /D8.tmp.thread16.left_tail_barcode_compare.paf 2> /dev/null
[01/06/2021 12:48:49] Return value is: 35072

The text was updated successfully, but these errors were encountered:

fangli80 · 2021-01-07T00:05:05Z

Hi,
Thanks for using AmpBinner.
How many barcodes do you have in the 3M_Feb_737_Apr_Aug.txt file? I guess it's 3 million? If so, I think it's too many, which caused a memory issue.

Do you have a more precise list of barcodes? It would be great if you can narrow down this list to a few thousand. Not only for saving memory but also for better demultiplexing results. You can get a precise list of barcodes if you have the 10X genomics short-read sequencing data.

aheravi · 2021-01-07T17:01:22Z

No, unfortunately, I don't have the 10X data. That is why I am using all available barcodes to figure out the used ones.

I see some of the threads got failed (4 out of 24). I re-ran the failed commands and they got finished successfully. Not sure what is next after having the missing paf files?

fangli80 · 2021-01-07T21:19:53Z

If you don't know the barcodes, you'd better ask the person who performed the experiment. Usually, a subset of the 3 million barcodes was used in one experiment.
I only tested AmpBinner when the number of barcodes is about 10,000.

By the way, are you sure the barcode upstream sequence is AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT ?

If you finished AmpBinner successfully, you will see 3 output files:
prefix.all_reads.txt, which contains the barcodes of all reads
prefix.demultiplexing.PASS.reads.txt, which contains the barcodes that were confidently assigned.
prefix.demultiplexing.statistics.txt, which is the number of reads of each barcode.

aheravi · 2021-01-12T18:09:14Z

Hi @fangli08,
We don't know the exactly used barcodes. That is why I am testing all possible ones. Anyway, I successfully ran your tool on the 50,000 split lists.

WRT to the barcode upstream sequence, I think the sequence should be the one that you've mentioned in your comment. I see more reads identified with that sequence compared with an alternative sequence identified by running some code on the reads (which seems to be the second half of the sequence in your comment).

#cellular_barcode_seq	num_reads
AACACACTCAGCCTTC	1481

Using the alternative sequence, ACACTCTTTCCCTACACGACGCTCTTCCGATCT.

#cellular_barcode_seq	num_reads
AACACACTCAGCCTTC	147

Please let me know your thoughts on that.
Thanks!

fangli80 · 2021-01-13T13:24:10Z

That looks correct.

aheravi · 2021-01-15T22:27:27Z

Hi @fangli08 ,

Some of the 10X barcodes I used in running "ampBinner_10X.py" were duplicates and so when I split the list, they randomly ended up in two different files. Now, I see a different number of reads for the duplicate barcodes. Any thoughts on that?

grep AACCATGAGCAGGTCA *stat*txt
xac.demultiplexing.statistics.txt:AACCATGAGCAGGTCA	113
xfu.demultiplexing.statistics.txt:AACCATGAGCAGGTCA	7

wc -l xac*spli*txt
50000 xac.10X_barcodes_splitted.txt

wc -l xfu*spli*txt
50000 xfu.10X_barcodes_splitted.txt

fangli80 · 2021-01-17T02:16:58Z

When binning the reads, AmpBinner will consider the best matched barcode and the second best matched barcode, to exclude potential misclasification due to sequencing error of the long reads.

To confidently assign a read to a barcode, the following two criteria is required:

the number of edite bases (mismatch + insertions + deletions) of the best matched barcode < 3
the number of edite bases of the second best matched barcode - the number of edite bases of the best matched barcode > 2

For example, if a barcode list has the following two barcodes:

barcode-1: AACCATGAGCAGGTCA
barcode-2: ATCCATGAGCAGGTCA
(only one base difference)

If a read has a sequence of AACCATGAGCAGGTCA right after the barcode_upstream_seq, then barcode-1 is the best matched barcode (num_edited_bases = 0) and barcode-2 is the second best matched barcode (num_edited_bases = 1) . It meets criterion 1 but not criterion 2, so this read is not assigned to barcode-1, because it may have barcode-2 but happen to have one sequencing error in the second base.

Therefore, all the barcodes should be supplied in one file. I thought you split the list simply to test if the tool works in your environment.

Since there are too many barcodes, which use too much resources, I think you can narrow down this list by collecting all barcodes from the prefix.all_reads.txt files of the split runs, generate a new barcode list, and then remove all duplicates in the barcode list file and run the tool with all the new barcodes in one file.

fangli80 self-assigned this Jan 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

minimap error #1

minimap error #1

aheravi commented Jan 6, 2021

fangli80 commented Jan 7, 2021 •

edited

Loading

aheravi commented Jan 7, 2021

fangli80 commented Jan 7, 2021 •

edited

Loading

aheravi commented Jan 12, 2021

fangli80 commented Jan 13, 2021

aheravi commented Jan 15, 2021

fangli80 commented Jan 17, 2021

minimap error #1

minimap error #1

Comments

aheravi commented Jan 6, 2021

fangli80 commented Jan 7, 2021 • edited Loading

aheravi commented Jan 7, 2021

fangli80 commented Jan 7, 2021 • edited Loading

aheravi commented Jan 12, 2021

fangli80 commented Jan 13, 2021

aheravi commented Jan 15, 2021

fangli80 commented Jan 17, 2021

fangli80 commented Jan 7, 2021 •

edited

Loading

fangli80 commented Jan 7, 2021 •

edited

Loading