Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Mixed RNA/DNA found. #61

Open
shaos1996 opened this issue Nov 14, 2024 · 15 comments
Open

ValueError: Mixed RNA/DNA found. #61

shaos1996 opened this issue Nov 14, 2024 · 15 comments

Comments

@shaos1996
Copy link

I analyzed the intact seq extracted by LTR_retriever with the command 'TEsorter acau.intact.fa -db rexdb-plant -p 36 -rule 70-30-80'. After splitting the sequence into 72 pieces and translating it, an error was reported. Please tell me what is going on with this issue.

BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning,
2024-11-14 13:25:00,730 -INFO- translating ./tmp-b151e46e-a248-11ef-993b-ac1f6bc1b736/chunk.72.fasta in six frames
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/public1/user/shaoshao/software/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/public1/user/shaoshao/software/miniconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/public1/user/shaoshao/software/miniconda3/lib/python3.7/site-packages/TEsorter-1.4.7-py3.7.egg/TEsorter/app.py", line 1100, in _translate
return translate(inSeq, overwrite=overwrite)
File "/public1/user/shaoshao/software/miniconda3/lib/python3.7/site-packages/TEsorter-1.4.7-py3.7.egg/TEsorter/app.py", line 1080, in translate
six_frame_translate(inSeq, fp)
File "/public1/user/shaoshao/software/miniconda3/lib/python3.7/site-packages/TEsorter-1.4.7-py3.7.egg/TEsorter/modules/translate_seq.py", line 10, in six_frame_translate
for seq, suffix0 in zip([rc.seq, rc.seq.reverse_complement()], ['aa', 'rev_aa']):
File "/public1/user/shaoshao/software/python37/lib/python3.7/site-packages/Bio/Seq.py", line 1911, in reverse_complement
return self.complement()[::-1]
File "/public1/user/shaoshao/software/python37/lib/python3.7/site-packages/Bio/Seq.py", line 1847, in complement
raise ValueError("Mixed RNA/DNA found")
ValueError: Mixed RNA/DNA found

@zhangrengang
Copy link
Owner

@shaos1996 Please check: are both T and U in your fasta file?

@shaos1996
Copy link
Author

shaos1996 commented Nov 14, 2024

No sequence containing U. and I tested it on 5 genomes and only one did not report an error.

@shaos1996
Copy link
Author

This is the output of the bug, it seems that some standard output and sequences are connected together.

Debug - self._data content: b'GTCTATGTGTATGTGTTTGATTGTTTCTCTCTTGCTTAGGAACTAGGACATGTGCATGACGAAATGTTAAATGCTTAGGAACCTAAGAACATCTTTGCTATAAGGGGTTAACTCCTAAAGGAATTGCAACCAAGCATAACCTTTCTCTCACTCTCTTTCTCTCTTTTAAGGATACTTAAAGCACCCAAAAGAACTTAAAGTAATTAAGTATTTTCCGCTGCATGTTTATCATGAAAATACCTAGATGCATGACAre-organizeacau.fa.retriever.all.scnacau.fa.retriever.all.scntotal678715,0withoutchr,11402discarded,667313retained'/public1/user/shaoshao/software/miniconda3/lib/python3.7/site-packages/Bio/Seq.py:2750: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning,
/public1/user/shaoshao/software/miniconda3/lib/python3.7/site-packages/Bio/Seq.py:1639: BiopythonDeprecationWarning: seq.reverse_complement() will change in the near future to always return DNA nucleotides only. Please use

seq.reverse_complement_rna()

@zhangrengang
Copy link
Owner

@shaos1996 Can you send me your fasta file to debug?

@shaos1996
Copy link
Author

I found the problem. It was that nohup was used in LTR_retriever.py get_full_seqs genome.fa > intact_ltr.fa. Remove nohup and it works fine.

@zhangrengang
Copy link
Owner

Yes, nohup rediects strandard error to strandard output.

@yaoxkkkkk
Copy link

yaoxkkkkk commented Nov 28, 2024

Hi I met the same error, here is the log:

/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  warnings.warn(
2024-11-28 23:52:31,498 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.21.fasta` in six frames
2024-11-28 23:52:34,762 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.22.fasta` in six frames
2024-11-28 23:52:34,942 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.23.fasta` in six frames
2024-11-28 23:52:35,079 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.24.fasta` in six frames
2024-11-28 23:52:35,109 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.25.fasta` in six frames
2024-11-28 23:52:35,110 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.26.fasta` in six frames
2024-11-28 23:52:35,161 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.27.fasta` in six frames
2024-11-28 23:52:35,168 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.28.fasta` in six frames
2024-11-28 23:52:35,169 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.29.fasta` in six frames
2024-11-28 23:52:35,215 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.30.fasta` in six frames
2024-11-28 23:52:35,244 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.31.fasta` in six frames
2024-11-28 23:52:35,248 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.32.fasta` in six frames
2024-11-28 23:52:35,260 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.33.fasta` in six frames
2024-11-28 23:52:35,265 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.34.fasta` in six frames
2024-11-28 23:52:35,272 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.35.fasta` in six frames
2024-11-28 23:52:35,292 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.36.fasta` in six frames
2024-11-28 23:52:35,294 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.37.fasta` in six frames
2024-11-28 23:52:35,296 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.38.fasta` in six frames
2024-11-28 23:52:35,324 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.39.fasta` in six frames
2024-11-28 23:52:35,332 -INFO- translating `./tmp-c38fe47e-ada0-11ef-9b74-7c8ae1d406bf/chunk.40.fasta` in six frames
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
           ^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/app.py", line 1100, in _translate
    return translate(inSeq, overwrite=overwrite)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/app.py", line 1080, in translate
    six_frame_translate(inSeq, fp)
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/modules/translate_seq.py", line 10, in six_frame_translate
    for seq, suffix0 in zip([rc.seq, rc.seq.reverse_complement()], ['aa', 'rev_aa']):
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py", line 825, in reverse_complement
    return self.complement()[::-1]
           ^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/Bio/Seq.py", line 761, in complement
    raise ValueError("Mixed RNA/DNA found")
ValueError: Mixed RNA/DNA found
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/bin/TEsorter", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/app.py", line 1265, in main
    pipeline(Args())
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/app.py", line 257, in pipeline
    gff, geneSeq = LTRlibAnn(
                   ^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/app.py", line 1165, in LTRlibAnn
    chunk_files = hmmscan_pp(ltrlib, hmmdb=hmmdb, hmmout=domtbl, tmpdir=tmpdir, 
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/app.py", line 1118, in hmmscan_pp
    chunk_files = list(pool_func(_translate, iterable, processors=processors))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/site-packages/TEsorter/modules/RunCmdsMP.py", line 347, in pool_func
    for returned in pool_map(func, iterable, **kargs):
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/multiprocessing/pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dssg/home/acct-jiang.lu/jiang.lu-user1/mambaforge/envs/tesorter/lib/python3.11/multiprocessing/pool.py", line 774, in get
    raise self._value
ValueError: Mixed RNA/DNA found

I have checked the input TE lib file by seqkit, it returned:

�[33m[WARN]�[0m it's already DNA, no need to convert

I can't find nohup in related script, how can I resolve it?

@zhangrengang
Copy link
Owner

zhangrengang commented Nov 29, 2024

@yaoxkkkkk Please check the input TE lib file by cat your_TE_lib.fasta | grep -v ">" | grep -Po "[^ATCG]" -i.

@yaoxkkkkk
Copy link

yaoxkkkkk commented Nov 29, 2024

It returns lots of non-ATCG:

cat XZ.mod.fa.mod.EDTA.TElib.fa | grep -v ">" | grep -Po "[^ATCG]" -i
w
w
r
w
s
w
y
r
k
r
w
y
s
w
r
y
y
r
s
...

I will check my TE lib again, thank you for your reply!

@yaoxkkkkk
Copy link

I checked the file I merged, these non-ATCG characters are from athrep.updated.nonredun.fasta provided by EDTA, is there anything that I missed for using this file?

@yaoxkkkkk
Copy link

I realise they may IUPAC code, TEsorter can deal with that?

@yaoxkkkkk
Copy link

And there are lots of X exits in the file, does it matter?
image

@zhangrengang
Copy link
Owner

Are these from the raw genome sequences? Are there some U characters? ValueError: Mixed RNA/DNA found is recognized by biopython. biopython can deal with IUPAC code. Check by cat XZ.mod.fa.mod.EDTA.TElib.fa | grep -v ">" | grep -Po "[^ATCG]" -i | sort |uniq -c

@yaoxkkkkk
Copy link

Yes it has, but seqkit fail to recognize them :(

cat XZ.mod.fa.mod.EDTA.TElib.mod.fa | grep -v ">" | grep -Po "[^ATCG]" -i | sort |uniq -c
    735 B
   1341 D
    111 E
     44 F
   1378 H
    103 I
  55290 K
     63 L
  65863 M
 168927 N
     81 O
     46 P
  22936 R
  36771 S
     39 U
    786 V
  75432 W
  36236 X
  23220 Y

@zhangrengang
Copy link
Owner

Some characters (e.g. E, F) are not DNA codes, and need to be checked, deleted or replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants