Institute of Bioinformatics Münster
Description of tryporfs

Overview

Tryporfs is a query tool to access a database of upstream open reading frames (uORFs) annotated in Trypanosoma congolense TcIL3000 genome. Genomic data and nucleotide sequences of CDSs were obtained from the TriTrypDB database (release 25). 5’ UTRs are derived from a T. congolense genome re-sequencing and re-annotation project (unpublished work, M. JÄ…kalski, Institute of Bioinformatics Münster). uORFs were computationally identified as a start codon in 5’ UTR followed by an in-frame stop codon, without minimum length restriction. For detailed information on data retrieval please see “usage”.

Authors

Philipp Fervers, Florian Fervers, Norbert Grundmann, Tabea Kischka and Marcin Jakalski

Usage

To access the entire set of 31149 annotated uORFs, the user can define queries based on the below mentioned distinct uORF features. Output can be ordered by a specified column and exported to an Excel file.

  • ID or Gene – ID of a T. congolense gene as annotated in TriTrypDB or an ID of a specific uORF as deposited in this database. For the latter, the naming format is defined as: gene ID - uORF stop # - uORF start #; counted increasingly starting with most proximal uORF stop codon to the CDS start codon and uORF start codon most distant from the in frame stop codon
  • Length – boundaries for filtering uORFs based on their nucleotide sequence length
  • Chromosome – chromosomal location of uORF
  • uORF Stop to CDS Start – distance from uORF stop codon to the start codon of respective CDS; negative values represent uORFs that partially overlap CDS
  • ASI – Amino Acid Similarity Index; resemblance of the amino acid usage of observed uORF against the background set of all uORFs; values close to zero represent high similarity (for more details please refer to Fervers et al. 2016)
  • CAI – Codon Adaptation Index; measure for optimization of codon usage (see Carbone et al. 2003 for details); values close to one refer to highly selective codon usage in ORF
  • CAI uORF against CDSs – uORF CAI calculated against the background set of CDSs; values close to one refer to high resemblance of an optimized set of codons
  • Stage specific uORF weight – presence of uORF in stage specific transcriptome, measured as number of mapped Illumina mRNA reads; value of 9.9 represents sequences below threshold (min. 10 reads)

Selecting a given uORF ID in tryporf’s output redirects to a gene specific window providing the following data:

gene, chromosome, start and stop coordinates, strand, product, length, maximum 5’ UTR length, ASI and CAI of CDS, tandem IDs and # of tandem repeats, gene orthology group, stage specific splice sites (in the format: splice site ID, coordinates, # reads), maximum # of uORFs and stage specific mRNA normalized proteomic data.

Data download

To download the results of each query a link "Export Data to Excel" is provided on the result page, which allows to store the results in an Excel file. Additionally, a GFF and BED file are provided for the entire set of predicted uORFs of T. congolense, as well as for a filtered set of non-redundant uORFs, where in case of many consecutive in-frame start codons, only the longest uORF variant was selected.

2016-10-20 12:10