Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Briekit-events output fewer genes than expected #6

Open
nilcam opened this issue Mar 2, 2023 · 2 comments
Open

Briekit-events output fewer genes than expected #6

nilcam opened this issue Mar 2, 2023 · 2 comments

Comments

@nilcam
Copy link

nilcam commented Mar 2, 2023

Hello,

I am using briekit-events to extract splicing events from the human gencode v25 annotation file in gff3 format, which has over 55k genes. However, the output file has alternative splicing events of only 10k unique genes (only SE events) or 12k unique genes if all types of AS are taken into account. I did not perform any further filtering steps. Why is there such a big difference? How can I get all the splicing events for all the genes in gencode v25?

I've checked the file provided in brie2 documentation (https://sourceforge.net/projects/brie-rna/files/annotation/human/gencode.v25/) and it has roughly the same amount of unique genes.

I wonder if it might be because many genes have AS events that are not one of the 5 types that briekit is looking for (SE, RI, A3SS, A5SS or MXE). Could this be the reason?

Thank you very much in advance,
Nil

@huangyh09
Copy link
Owner

Hi Nil,

Thanks for the question. I don't have a direct answer but it could be true that other AS (e.g., alternative TSS or poly-A) are not included. Also, you may check the number of "genes" with only one transcript in the 55k genes.

Yuanhua

@nilcam
Copy link
Author

nilcam commented Mar 3, 2023

Hi Yuanhua,

Thank you very much for your quick response!

As you suggested, I checked the number of genes with only one transcript: 34k have only one, 22k have multiple transcripts. About 12k genes have 5 transcripts or more, which is fewer but close to the number of genes that briekit-events outputs.

I've also check how many transcripts has each of the 12k genes that briekit-events outputs, and all of them have 5 transcripts or more, meaning in the briekit-events output there's no genes that only had 1-4 transcripts in the gencode annotation. Moreover, in the briekit-events output there's not all genes with 5 transcripts or more.

Here are some numbers:

Number of transcripts per gene in gencode annotation:
1      34266 
2      4508  
3      2750  
4      2132
5      1842
6      1503
7      1358
8      1149
       ...  
Number of transcripts per gene in amongst the genes outputted by briekit-events:
1      0
2      0
3      0
4      0
5      1032
6      1018
7      1016
4      960
8      909
       ... 

I've manually checked the gencode annotation of some of the genes that are not in briekit-events output, and I can't find any difference to other genes that are included in briekit-events output. For instance, KRT14 has only 3 transcripts, but the annotation itself seems to include some alternative splicing. However, I am not completely sure of how to check AS from gencode annotation by hand.

Do you think it's expected that all genes with less than 5 transcripts are considered to not have AS (or at least one of the 5 types that briekit-events tries to find)?

Any further help or suggestions would be very much appreciated.

Nil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants