-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The content of the .exonic_variant_function file is empty. #270
Comments
This means that the variants are not annotated to any chromosome. It could be due to many reasons, so you want to manually check pantro_refGene files to see what's wrong, for example, "1" instead of "chr1" is used as chromosome name, or that the location (start-end) is based on transcript rather than assembly (chr1), etc. Without providing any details at all, I cannot tell where the issue is. Please read FAQ #1. |
I'm sorry to bother you again, but your reply is very important to me. I'm very confused about the issue mentioned above and don't know where to start. I sincerely hope you can offer some suggestions for correcting or improving it. Best regards! |
The pantro_refGeneMrna.fa file looks fine to me. The refGene.txt file also looks okay to me. If you just send me the files (or just the first a few hundred genes), then perhaps I can check them further to see what is the issue. One possibility is that many genes may not have the correct ORF (in your figure, that gene has the warning), so there is no output, but I need to see the file to know the percentage. Also what command did you run and what is the LOG/NOTICE message after you run it? I need to see it to advise where things are wrong as I mentioned in FAQ #1. |
Since my input file is too large, extracting only part of the content cannot ensure consistency. Therefore, please forgive me for directly sending you the links to the data sources. Could you please help me test them? The links for the chimpanzee genome and the .gff3 format annotation file are as follows: Firstly, for the construction of the annotation library named pantrodb. After decompressing the aforementioned files, I used the following code to convert the gff file to a gtf file. Next, for the annotation part. Previously, I used SyRI to identify structural variations between the chimpanzee and bonobo genomes, and used the result file syri.vcf as the input file for annotation. Use the following command to convert the .vcf file to an .avinput file. After that, use table_annovar.pl for annotation with the following command. After that, I chose to use another script annotate_variation.pl for annotation. The command is as follows. The link for the log file panthro-panpan.log is https://github.com/leiwang567/chimpanzee-data/pantro-panpan.log I believe that the annotation results do not show any variants related to exons, which I think is incorrect. Therefore, I would appreciate it if you could provide me with some suggestions for improvement after testing. Thank you very much! Best regards! |
Hello, thank you very much for taking the time to help me with my confusion. I have to admit that ANNOVAR is a very useful annotation tool. However, I am currently facing a tricky problem and I am seeking your help. Previously, I used minimap2 and SyRI to perform sequence alignment and SV identification on the whole genomes of chimpanzees and bonobos. Now, I am using ANNOVAR to annotate the structural variations in the output files from SyRI.
For the annotation database, I used the latest chimpanzee genome sequence .fasta and .gff3 annotation files to build it myself, resulting in pantro_refGeneMrna.fa and pantro_refGene.txt. When I used annotate_variation.pl for gene-based annotation, the result file pantro-panpan.exonic_variant_function was empty, and the first column of the pantro-panpan.variant_function file was all intergenic. Moreover, when I used the table_annovar.pl script to re-annotate with a single annotation database (the RefSeq database I built myself), the result files were pantro-panpan.pantro_multianno.csv and pantro-panpan.refGene.invalid_input. The content of the pantro-panpan.pantro_multianno.csv file is shown in the image below. I am very puzzled about this situation. What could be the reason for the above situation? Or are there any mistakes in my operation process? I sincerely hope for your answer, as this is very important to me!
The text was updated successfully, but these errors were encountered: