Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minimap error #1

Open
aheravi opened this issue Jan 6, 2021 · 7 comments
Open

minimap error #1

aheravi opened this issue Jan 6, 2021 · 7 comments
Assignees

Comments

@aheravi
Copy link

aheravi commented Jan 6, 2021

Hi,
I am trying to use ampBinner_10X.py on my fastq and getting errors on the alignment step. Could you please comment on that?

Thanks!

Running code:

ampBinner_10X.py --in_fq P.fastq.gz --barcode_list 3M_Feb_737_Apr_Aug.txt --barcode_upstream_seq AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --out_prefix D8 --num_threads 24 --minimap2 /linux-x86_64-centos7/minimap2-2.15/minimap2 >run.log 2>&1 &

Error:

sh: line 1: 51263 Killed                  //minimap2-2.15/minimap2 -t 1 --for-only --eqx -c --cs -N 200 -k 5 -w 3 -n 1 -m 10 -s 40 -A 4 -x map-ont /D8.tmp.thread16.barcode_with_anchor.fasta /D8.tmp.thread16.left_tail_barcode_candidate.fastq > /D8.tmp.thread16.left_tail_barcode_compare.paf 2> /dev/null
[01/06/2021 12:48:49] ERROR: Failed to run command: /minimap2-2.15/minimap2  -t 1 --for-only --eqx -c --cs -N 200 -k 5 -w 3 -n 1 -m 10 -s 40 -A 4 -x map-ont  /D8.tmp.thread16.barcode_with_anchor.fasta /D8.tmp.thread16.left_tail_barcode_candidate.fastq > /D8.tmp.thread16.left_tail_barcode_compare.paf 2> /dev/null
[01/06/2021 12:48:49] Return value is: 35072
@fangli80
Copy link
Collaborator

fangli80 commented Jan 7, 2021

Hi,
Thanks for using AmpBinner.
How many barcodes do you have in the 3M_Feb_737_Apr_Aug.txt file? I guess it's 3 million? If so, I think it's too many, which caused a memory issue.

Do you have a more precise list of barcodes? It would be great if you can narrow down this list to a few thousand. Not only for saving memory but also for better demultiplexing results. You can get a precise list of barcodes if you have the 10X genomics short-read sequencing data.

@fangli80 fangli80 self-assigned this Jan 7, 2021
@aheravi
Copy link
Author

aheravi commented Jan 7, 2021

No, unfortunately, I don't have the 10X data. That is why I am using all available barcodes to figure out the used ones.

I see some of the threads got failed (4 out of 24). I re-ran the failed commands and they got finished successfully. Not sure what is next after having the missing paf files?

@fangli80
Copy link
Collaborator

fangli80 commented Jan 7, 2021

If you don't know the barcodes, you'd better ask the person who performed the experiment. Usually, a subset of the 3 million barcodes was used in one experiment.
I only tested AmpBinner when the number of barcodes is about 10,000.

By the way, are you sure the barcode upstream sequence is AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT ?

If you finished AmpBinner successfully, you will see 3 output files:
prefix.all_reads.txt, which contains the barcodes of all reads
prefix.demultiplexing.PASS.reads.txt, which contains the barcodes that were confidently assigned.
prefix.demultiplexing.statistics.txt, which is the number of reads of each barcode.

@aheravi
Copy link
Author

aheravi commented Jan 12, 2021

Hi @fangli08,
We don't know the exactly used barcodes. That is why I am testing all possible ones. Anyway, I successfully ran your tool on the 50,000 split lists.

WRT to the barcode upstream sequence, I think the sequence should be the one that you've mentioned in your comment. I see more reads identified with that sequence compared with an alternative sequence identified by running some code on the reads (which seems to be the second half of the sequence in your comment).

#cellular_barcode_seq	num_reads
AACACACTCAGCCTTC	1481

Using the alternative sequence, ACACTCTTTCCCTACACGACGCTCTTCCGATCT.

#cellular_barcode_seq	num_reads
AACACACTCAGCCTTC	147

Please let me know your thoughts on that.
Thanks!

@fangli80
Copy link
Collaborator

That looks correct.

@aheravi
Copy link
Author

aheravi commented Jan 15, 2021

Hi @fangli08 ,

Some of the 10X barcodes I used in running "ampBinner_10X.py" were duplicates and so when I split the list, they randomly ended up in two different files. Now, I see a different number of reads for the duplicate barcodes. Any thoughts on that?

grep AACCATGAGCAGGTCA *stat*txt
xac.demultiplexing.statistics.txt:AACCATGAGCAGGTCA	113
xfu.demultiplexing.statistics.txt:AACCATGAGCAGGTCA	7
wc -l xac*spli*txt
50000 xac.10X_barcodes_splitted.txt

wc -l xfu*spli*txt
50000 xfu.10X_barcodes_splitted.txt

@fangli80
Copy link
Collaborator

When binning the reads, AmpBinner will consider the best matched barcode and the second best matched barcode, to exclude potential misclasification due to sequencing error of the long reads.

To confidently assign a read to a barcode, the following two criteria is required:

  1. the number of edite bases (mismatch + insertions + deletions) of the best matched barcode < 3
  2. the number of edite bases of the second best matched barcode - the number of edite bases of the best matched barcode > 2

For example, if a barcode list has the following two barcodes:

barcode-1: AACCATGAGCAGGTCA
barcode-2: ATCCATGAGCAGGTCA
(only one base difference)

If a read has a sequence of AACCATGAGCAGGTCA right after the barcode_upstream_seq, then barcode-1 is the best matched barcode (num_edited_bases = 0) and barcode-2 is the second best matched barcode (num_edited_bases = 1) . It meets criterion 1 but not criterion 2, so this read is not assigned to barcode-1, because it may have barcode-2 but happen to have one sequencing error in the second base.

Therefore, all the barcodes should be supplied in one file. I thought you split the list simply to test if the tool works in your environment.

Since there are too many barcodes, which use too much resources, I think you can narrow down this list by collecting all barcodes from the prefix.all_reads.txt files of the split runs, generate a new barcode list, and then remove all duplicates in the barcode list file and run the tool with all the new barcodes in one file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants