-
Notifications
You must be signed in to change notification settings - Fork 10
序列文件格式(Sequence File Formats)
Ricky Woo edited this page Sep 20, 2017
·
7 revisions
column-number | content | values/format |
---|---|---|
1 | chromosome name | chr{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y,M} |
2 | annotation source | {ENSEMBL,HAVANA} |
3 | feature-type | {gene,transcript,exon,CDS,UTR,start_codon,stop_codon,Selenocysteine} |
4 | genomic start location | integer-value (1-based) |
5 | genomic end location | integer-value |
6 | score (not used) | . |
7 | genomic strand | {+,-} |
8 | genomic phase (for CDS features) | {0,1,2,.} |
9 | additional information as key-value pairs | see below |
<table>
<thead>
<tr>
<th>
key name
</th>
<th>
value format
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
gene_id
</td>
<td>
ENSGXXXXXXXXXXX *
</td>
</tr>
<tr>
<td>
transcript_id
</td>
<td>
ENSTXXXXXXXXXXX *
</td>
</tr>
<tr>
<td>
gene_type
</td>
<td>
<a href="gencode_biotypes.html">list of biotypes</a>
</td>
</tr>
<tr>
<td>
gene_status
</td>
<td>
{KNOWN, NOVEL, PUTATIVE}
</td>
</tr>
<tr>
<td>
gene_name
</td>
<td>
string
</td>
</tr>
<tr>
<td>
transcript_type
</td>
<td>
<a href="gencode_biotypes.html">list of biotypes</a>
</td>
</tr>
<tr>
<td>
transcript_status
</td>
<td>
{KNOWN, NOVEL, PUTATIVE}
</td>
</tr>
<tr>
<td>
transcript_name
</td>
<td>
string
</td>
</tr>
<tr>
<td>
exon_number
</td>
<td>
indicates the biological position of the exon in the transcript
</td>
</tr>
<tr>
<td>
exon_id
</td>
<td>
ENSEXXXXXXXXXXX *
</td>
</tr>
<tr>
<td>
level
</td>
<td>
1 (verified loci),<br />
2 (manually annotated loci),<br />
3 (automatically annotated loci)
</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th>
key name
</th>
<th>
value format
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
tag
</td>
<td>
part of a special set [*]:  {pseudo_consens,CCDS,seleno};<br />
or annotation remarks ["cds_start_NF", "mRNA_end_NF", etc.]<br />
<a href="gencode_tags.html">list of tags</a>
</td>
</tr>
<tr>
<td>
ccdsid
</td>
<td>
official CCDS id [*];  CCDS*
</td>
</tr>
<tr>
<td>
havana_gene
</td>
<td>
gene-id in the havana db [0,1];  OTTHUMG*
</td>
</tr>
<tr>
<td>
havana_transcript
</td>
<td>
transcript-id in the havana db [0,1] ;  OTTHUMT*
</td>
</tr>
<tr>
<td>
protein_id
</td>
<td>
ENSPXXXXXXXXXXX [0,1] (Ensembl protein id of protein coding transcript)
</td>
</tr>
<tr>
<td>
ont
</td>
<td>
pseudogene (or other) ontology ids [*];  {PGO:0000004 and others}
</td>
</tr>
<tr>
<td>
transcript_support_level
</td>
<td>
transcripts are scored according to how well mRNA and EST alignments match over its full length [0,1]<br />
1 (all splice junctions of the transcript are supported by at least one non-suspect mRNA),<br />
2 (the best supporting mRNA is flagged as suspect or the support is from multiple ESTs),<br />
3 (the only support is from a single EST),<br />
4 (the best supporting EST is flagged as suspect),<br />
5 (no single transcript supports the model structure),<br />
NA (the transcript was not analyzed)
</td>
</tr>
</tbody>
</table>
chr21 HAVANA transcript 10862622 10863067 . + . gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA exon 10862622 10862667 . + . gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA CDS 10862622 10862667 . + 0 gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA start_codon 10862622 10862624 . + 0 gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA exon 10862751 10863067 . + . gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA CDS 10862751 10863064 . + 2 gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA stop_codon 10863065 10863067 . + 0 gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
chr21 HAVANA UTR 10863065 10863067 . + . gene_id "ENSG00000169861"; transcript_id "ENST00000302092"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "IGHV1OR15-5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "IGHV1OR15-5-001"; level 2; havana_gene "OTTHUMG00000074130"; havana_transcript "OTTHUMT00000157419";
On the way to the garden of bioinformatics.
A bioinformatics wiki for the course BI462.