-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Briekit-events output fewer genes than expected #6
Comments
Hi Nil, Thanks for the question. I don't have a direct answer but it could be true that other AS (e.g., alternative TSS or poly-A) are not included. Also, you may check the number of "genes" with only one transcript in the 55k genes. Yuanhua |
Hi Yuanhua, Thank you very much for your quick response! As you suggested, I checked the number of genes with only one transcript: 34k have only one, 22k have multiple transcripts. About 12k genes have 5 transcripts or more, which is fewer but close to the number of genes that briekit-events outputs. I've also check how many transcripts has each of the 12k genes that briekit-events outputs, and all of them have 5 transcripts or more, meaning in the briekit-events output there's no genes that only had 1-4 transcripts in the gencode annotation. Moreover, in the briekit-events output there's not all genes with 5 transcripts or more. Here are some numbers:
I've manually checked the gencode annotation of some of the genes that are not in briekit-events output, and I can't find any difference to other genes that are included in briekit-events output. For instance, KRT14 has only 3 transcripts, but the annotation itself seems to include some alternative splicing. However, I am not completely sure of how to check AS from gencode annotation by hand. Do you think it's expected that all genes with less than 5 transcripts are considered to not have AS (or at least one of the 5 types that briekit-events tries to find)? Any further help or suggestions would be very much appreciated. Nil |
Hello,
I am using briekit-events to extract splicing events from the human gencode v25 annotation file in gff3 format, which has over 55k genes. However, the output file has alternative splicing events of only 10k unique genes (only SE events) or 12k unique genes if all types of AS are taken into account. I did not perform any further filtering steps. Why is there such a big difference? How can I get all the splicing events for all the genes in gencode v25?
I've checked the file provided in brie2 documentation (https://sourceforge.net/projects/brie-rna/files/annotation/human/gencode.v25/) and it has roughly the same amount of unique genes.
I wonder if it might be because many genes have AS events that are not one of the 5 types that briekit is looking for (SE, RI, A3SS, A5SS or MXE). Could this be the reason?
Thank you very much in advance,
Nil
The text was updated successfully, but these errors were encountered: