The start page allows you to query the database. The menu on the left lets you access
functions of the web site, regardless, if you are on the start page or not. The most
important features that can be accessed from the menu are introduced in the following
sections.
Query the database
You can search for items in the database by entering your search term
in the search bar (as shown in Figure 1 for gene CEBPB) and clicking the search
icon on the right. Please note that your search term cannot be empty.
This limit was imposed to ensure good search performance for a large quantity of users.
Please note that by convention, we use genomic sequences, thus you must replace "U" in your
nucleotide query by "T".
By default, we ignore any special characters in your search for authors and publication titles:
For example when searching for an author named "Muller", you will find "Muller" and "Müller".
Searching for an author called "Müller" will find authors called "Müller" and "Muller".
Figure 1: The search interface.
Below the search bar, you can see grey panels; one for each view. We
use the term view for a display of related items (publications, genes,
transcripts, uORFs,...). Each view has a set of defined searchable fields.
By de-(selecting) the checkboxes in front of the fields, you can refine
your search results. By default, you are searching in all searchable fields
from all views.
The panels will also provide you with the number of hits in the selected searchable
fields from each view (see Figure 1). In our example, we searched for "CEBPB"
which only gave us hits in the gene and publication views (see Figure 1). This is
because, the search term only matched the fields in the gene and publication views.
This does not mean that there are no entries in the other views that are related to CEBPB
(e.g.: we have transcripts and uORFs for CEBPB).
For performance reasons, only the best 1000 hits per view are shown. By clicking
the blue arrow button, you can directly jump to your search results. The arrow
button only appears, if hits were found in the respective view.
Navigate in the search results
If you choose to go from the search to the gene view, you will see
those genes that matched your query. You can still navigate to any
other view in the database by using the buttons at the top of the page.
Buttons are greyed out, if there are no items in that particular view.
All items that you will see in the other views are related to your
hits in the gene view. Thus, navigating from your gene hits to the transcript
view is not the same as performing the original search for CEBPB on the
transcript view.
Figure 2: Controls in the gene view.
You can select one or multiple rows by ticking the checkboxes in front of the
rows and clicking "Apply" in the top panel. This will remove any other rows
from your current view. If you navigate to any other view, all items will be
related to your selected rows. You also have designated buttons to tick all
checkboxes on the current page ("Page+"), to uncheck all boxes ("None") or
to uncheck all boxes on the current page ("Page-"). The number of checked
boxes is shown next to these buttons. Selections can be undone using "Undo"
or by clicking on one of your actions in the "Selection History". This will restore
the results of that particular action.
On the top right, there are the pagination controls ("<<": first, "<":
one page backward, ">": one page forward, ">>": last page), followed by the
current page and total page count.
Some rows contain fields with many lines. We only display the first lines of
data for each of these fields. Click on ▼ in the respective field to
expand the row. Click on ▲ to shrink the fields of that row.
In the uORF and the variant view, we only display the first nucleotides of the
sequences (indicated by "..."). You can click on the sequences to
view the full sequences and to download them in FASTA format. The same applies
to the amino acid sequences.
Links to external resources in the views are highlighted using
.
Model
Our uORF model allows for a quick analysis by eye. It shows the full TLS and CDS of
transcripts with uORFs. ATG uORFs are highlighted in orange. Start codons that are
shared with a CDS (see column description: "Shared start codon")
are indicated by "*" behind the codon. You can access the model from the gene,
transcript and uORF view (see Figure 3). Click on "Model" to see the results for
all items in the current selection. However, there is a limit on how many
items can be displayed at once. If you exceed the cutoff, you will get an error
message and have to further limit your selection.
In the top menu, you can filter transcripts by NCBI ID and uORFs by start codon
(the same principle as in the uORF view). By default, we don't include introns in the
display and stop codons are hidden. You can change this behavior by selecting the respective
checkboxes. To apply your selections, click "Filter".
Figure 3: The uORF model.
Each transcript has its own panel. Each panel shows the three reading frames of the
transcript. Panels are independent. If you perform an action on one,
the others will remain unaffected. If you move the cursor across the
plot, the header of the respective panel will show you the current genomic position
(0-based, half-open; see also column descriptions).
To zoom in, click and hold the left mouse button and mark an area in the model:
Mark a small area for a high magnification. Mark a large area for a more modest
magnification of that area. Click "Reset" in the top right corner of the panel
to reset the zoom. You can export your current view by using the "PNG" or "SVG"
buttons in the top right corner of the panel.
If you want more information on a particular uORF, you can click on it to open an
overlay window. To exit the model, click on any view button at the top of the page.
UCSC Genome Browser
In the gene, transcript and uORF view, you can switch to the
UCSC Genome Browser by clicking
on the links in the gene symbol or the uORF ID columns. The link for gene symbol will
only be present, if we analyzed the gene for uORFs. You will then see all transcripts
and all uORFs for the associated gene. Nevertheless, if you entered the Genome Browser
from the transcript or uORF view, the transcript or uORF that you clicked on will be
highlighted. Using the Genome Browser, you can compare the uORFdb tracks to your own
custom tracks or to tracks showing ribosome profiling data. A full list of features is
beyond the scope of this tutorial, but the UCSC team has created a helpful
manual.
If you want to continue working with the uORF sequences, you will find the track details
page helpful. To access this page, click on one of our tracks in the Genome Browser. This
will open a new page (see Figure 4) where we put direct links to the
UCSC Table Browser.
Using these links, you can easily create a custom download of the sequences or upload the
sequences directly into the free GALAXY cloud service.
Figure 4: The track details page.
Export citations
You can export all publications in your selection into a single RIS file by clicking
the cite button in the publication view (there is an upper limit for the amount of
citations that can be exported at once). RIS is a standard citation file format. Each
record will be annotated with keywords from the database (KW in RIS). The keywords are
a short summary of the publication by our uORF experts. They are extracted
from the boolean (+/-) columns of the database (see also column descriptions below).
If a "+" or a "-" is reported for a publication, the column name is used as a keyword
for the citation file.
Our RIS file is fully compliant to its standard. Recognition of
single fields, especially the keyword fields, depends on your reference manager.
Nevertheless, we found that our citation files were processed as expected by
Citavi,
Endnote, and Zotero.
Get the latest publications
We publish the publications in uORFdb as an RSS feed that is updated once
a day. RSS is a popular protocol to distribute new items to users (podcasts,
new publications in journals, new publications matching a query in PubMed,...).
All you need is a feed reader which is a special app or program that can understand
RSS feeds. Feed readers are available for virtually any operating system, both on
your local PC or on your smartphone/tablet
(non-exhaustive list).
Click the button "Publication Updates" in the left panel of our website.
This will open a new tab. Don't worry about the content on this page. This is the RSS feed
in a machine-readable format that is not necessarily good to read for humans. Just copy
the URL of this page from the address bar of your browser. Now paste the URL into
your feed reader. The way, how to do this, depends on your app. After that, you
have successfully subscribed to our feed (see Figure 5, left). Every time we will
add a new publication to the database, your app will notify you.
Some publication managers support RSS feeds which we feel is a particularly useful
feature. The most mature implementation that we have seen so far is the one in
Zotero. In Zotero, you can choose "New Library" --> "News Feed" --> "From URL"
to open a dialog where you can paste the RSS URL.
From the feed view in Zotero, you can select a publication and inspect it. In the
field for the abstract, there are the tags/keywords from uORFdb (see Export citations),
followed by the actual abstract. From the top panel, you can chose to directly import the
publication to your library (see Figure 5, right). For technical reasons, we cannot provide
you the keywords from uORFdb, if you import an item from the RSS feed into your library.
If you need the keywords, you can either add them manually or import the citation directly
from the database, as explained in the previous section.
Figure 5: Left: A single publication from the RSS feed as displayed in the
Feeder app on Android.
Right: A single publication from the RSS feed as displayed in the Zotero reference manager.
Export uORF data
Click on "Excel" or "CSV" in the export panel at the top of the uORF view,
to export rows in your current selection in the respective format (there is an
upper limit to the amount of uORFs that can be exported at once). This will export
all columns, except for the links to dbSNP
and ClinVar. The exonic nucleotide
sequences will be the full sequences. The same applies to the amino acid sequences.
Bulk download
You can download all data that is stored in the database as TSV files. Click on
"Downloads" in the main menu on the left of the web site. In the download tab, you will find
further information on the type of content that can be downloaded and the file format
(see also README file). One word of warning: Some of the exported files are huge, so please
be aware that those should not be opened with a program like LibreOffice Calc or Excel. You should
use the command line or specialized programs to analyze these files. If you want to explore the
format of the downloads, we provide example files with only a few entries. These are generally safe
to open with a graphical table viewer. All downloads can be verified using the MD5 checksum.
Column descriptions
In the following you will find an in-depth description of all
columns in the uORFdb web interface. The documentation is split in
parts according to the views.
Please note the definitions of the most important technical terms:
0-based:The first position is zero and not one.
0-based, half-open: Count base positions from 0 and exclude
the end position (
details). This
is also known as
0-based start, 1-based end
If not indicated otherwise, the position names and the positions
coordinates themselves are always based on the "+" strand. The smallest
genomic coordinate is always the start, the largest genomic coordinate is
the end. This intuitively makes sense on the "+" strand. On the "-"
strand, however, it might be a little surprising.
A uORF on the negative strand would end on its start position and
start on its end position. If you want to have the "-" strand
coordinates, please follow the steps in
this resource.
By convention, we use genomic sequences and not mRNA sequences. This means that our sequences will not contain
uracil, but thymine.
Genes
Taxon
The taxon for the gene as provided by NCBI.
Gene symbol
The default gene symbol for the gene as provided by NCBI. Clicking on the symbol opens
the UCSC Genome Browser. In the Genome Browser, you can inspect all transcripts of the gene and,
if applicable, all uORFs. Genes which we did not analyze for uORFs will not
have a link to the Genome Browser.
NCBI ID
For most genes, we provide the gene ID in the NCBI gene database. If this ID is not available,
the accession number in the NCBI nucleotide database is displayed. Clicking on the ID (or accession)
will take you to the full entry on NCBI.
Symbol aliases
Unofficial/alternative symbols for the gene as provided by NCBI.
Names
Official and alias names for the gene as provided by NCBI.
Chromosome
The chromosome on which the gene resides.
Assembly
The assembly version of our sequence data for this gene. If no sequence data exists,
it is the current assembly version on NCBI at the time of the insert.
# Transcripts
The number of RefSeq transcripts for this gene in our database.
# Publications
The number of publications for this gene in our database.
Transcripts
Gene symbol
The official gene symbol for the transcript's mother gene as provided by NCBI.
Clicking on the symbol opens the UCSC Genome Browser. In the Genome Browser, you can inspect the
currently selected transcript (highlighted) and compare it to the other transcripts of
this gene. If applicable, you can also see all uORFs.
NCBI ID
The RefSeq accession and version of the transcript.
Chromosome
The chromosome of the transcript.
Genomic start
The start position of the transcript on the genome (0-based, half-open). This is always the
lower coordinate, regardless of strand.
Genomic end
The end position of the transcript on the genome (0-based, half-open). This is always the
higher coordinate, regardless of strand.
Strand
The strand of the transcript: "+" or "-".
Length [bp]
The length of the transcript without introns.
TLS length [bp]
The length of the TLS of the transcript without introns.
Kozak context
The Kozak consensus sequence of the CDS. It consists of
the six nucleotides upstream of the start codon, the start codon (highlighted), and
the following nucleotide.
Kozak strength
The translational efficacy of the Kozak context. All classifications consider the 3rd nucleotide
upstream of the start codon (A) and the nucleotide directly downstream of the start codon (B).
The strength is classified as "strong" (A: purin and B: guanine), "adequate" (A: purin or B: guanine)
or "weak" (A: not a purin and B: not a guanine). The strength is not shown, if the 3rd nucleotide
upstream of the start codon is missing.
# ATG uORFs
The number of uORFs for this transcript in our database with an ATG start codon.
# aTIS uORFs
The number of uORFs for this transcript in our database with an alternative
(!= ATG) start codon.
uORFs
uORF ID
NCBI ID of transcript + "_" + start codon + "." + number.
"number" is the consecutive number of uORFs on the same transcript with the same
start codon. Clicking on the name will take you to the UCSC Genome Browser. The chosen
uORF will be highlighted.
Chromosome
The chromosome of the uORF.
Genomic start
The start position of the uORF on the genome (0-based, half-open). This is
always the lower coordinate, regardless of strand.
Genomic end
The end position of the uORF on the genome (0-based, half-open). This is
always the higher coordinate, regardless of strand.
Strand
The strand of the uORF: "+" or "-".
Start codon
The start codon of the uORF. A star indicates that this start codon is shared with a
CDS (see "Shared start codon")
Stop codon
The stop codon of the uORF.
uORF length [bp]
The length of the uORF without introns.
CDS distance [bp]
The distance from the uORF end position on the genome to the
CDS start position on the genome. The value is negative, if the uORF end position is
located after the CDS start position.
5'-cap distance [bp]
The distance from the transcript start to the uORF start codon.
Kozak context
The Kozak consensus sequence of the uORF. It consists of
the six nucleotides upstream of the start codon, the start codon (highlighted), and
the following nucleotide.
Kozak strength
The translational efficacy of the Kozak context. All classifications consider the 3rd nucleotide
upstream of the start codon (A) and the nucleotide directly downstream of the start codon (B).
The strength is classified as "strong" (A: purin and B: guanine), "adequate" (A: purin or B: guanine)
or "weak" (A: not a purin and B: not a guanine). The strength is not shown, if the 3rd nucleotide
upstream of the start codon is missing.
Type
"non-overlapping": uORF is completely upstream of CDS. "overlapping": uORF overlaps with CDS, but
is not in the same frame. "N-terminal extension": uORF overlaps with CDS and is in the same frame.
Thus, CDS and uORF share the same stop codon. "possible N-terminal extension": A special case of the
"N-terminal extension". The CDS does not have a valid stop codon (possibly due to annotation errors).
Thus we continued the search for a uORF stop codon in the 3'-UTR.
Reading frame
The reading frame (1-3) in which the uORF resides. By definition, the reading frame
that matches the frame of the CDS is set to "1".
Exonic sequence
The sequence of the uORF excluding introns. By default,
only the first nucleotides are shown in the table. By clicking on the sequence,
you can view the whole sequence and optionally download it. The FASTA header has the
following format: ">" + gene symbol + "_" + NCBI ID of the transcript + "_" + genomic start + "_" +
genomic end + "_nucseq"
Amino acid sequence
The amino acid sequence of the uORF was generated by translating the exonic sequence
using the standard genetic code. The stop codon is shown as a "*". By definition, the
sequence starts with methionine (M), even if the nucleotide sequence starts with a non-canonical
uSTART codon. Although there are rare examples of a different first amino acid being incorporated,
methionine is used in the vast majority of translation events in eukaryotes.
By default, only the first amino acids are shown in the table. By clicking on the sequence,
you can view the whole sequence and optionally download it. The FASTA header has the
following format: ">" + gene symbol + "_" + NCBI ID of the transcript + "_" + genomic start + "_" +
genomic end + "_aminoseq"
Shared start codon
Is the uORF start codon shared by a CDS from another transcript of the same gene?
In this case, we list the NCBI IDs of the transcripts that harbor the CDSs. To identify shared
start codons, we compare the position of the biological start (not necessarily genomic start) and the start
codon sequence.
Exon variants in dbSNP
Contains a link to dbSNP for human uORFs. dbSNP is queried for
variants in the exonic uORF regions.
Exon variants in ClinVar
Contains a link to ClinVar for human uORFs. ClinVar is queried for
variants in the exonic uORF regions.
Variants
The results shown here are in whole or part based upon data generated by the
TCGA Research Network.
uORF ID
The name of the uORF which was affected by the variant (see also uORFs).
Genomic position
The start position of the mutation on the genome (left-aligned, normalized, 0-based, half-open).
Reference allele
The reference allele of the variant (left-aligned, normalized,
always related to "+" strand).
Alternate alleles
The alternate allele(s) of the variant (left-aligned, normalized,
always related to "+" strand).
Graph
The graph allows you to put the allele frequencies of the variant in cancer
patients into context with the frequencies of the variant in large reference cohorts.
The column depicts the somatic allele frequencies of the reference and
alternate allele(s) in the analyzed cancer types (upper panel). We only show
cancer types in which the variant has been identified, according to our analysis.
The lower panel shows the allele frequencies in up to three reference studies:
gnomAD Genomes, ExAC, and TopMed version 2 or 3. The frequencies are
based on dbSNP. We analyzed the following cancer types:
BRCA = breast cancer, COAD = colon cancer, LAML = blood cancer,
LUAD = lung cancer, PRAD = prostate cancer, and SKCM = skin cancer
REF start codon
The reference start codon of the uORF.
Start codon effects
The direct effect(s) of the alternate allele(s) on the start codon of the
uORF. If the start codon is lost, the effect is "loss". If the codon was an
aTIS codon ("TTG", "GTG", "CTG", "AAG", "AGG", "ACG", "ATA", "ATT", "ATC") and
the variant turns it into another aTIS codon, the effect is "aTIS->aTIS". If an
ATG is turned into an aTIS codon and vice versa the effects are "uAUG->aTIS"
and "aTIS->uAUG", respectively. If the variant leads to the loss of the uSTART
and additionally creates a new start codon in the next triplet,
the effect is "changed position"
ALT start codons
The alternate start codon(s) of the uORF. Empty, if no change or loss
(see Start codon effects).
REF stop codon
The reference stop codon of the uORF.
Stop codon effects
The direct effect(s) of the alternate allele(s) on the stop codon of the
uORF. If the stop codon changes into another valid uSTOP ("TAG", "TAA", "TGA"), the effect
is "uSTOP->uSTOP". If the variant leads to the loss of the uSTOP, the stop codon
effect is either "downstream uSTOP", if there is another in-frame stop codon
downstream, or "loss", if there is no downstream in-frame stop codon. A variant
within the uORF sequence can give rise to a new in-frame uSTOP. This is indicated by the
effect "upstream uSTOP".
ALT stop codons
The alternate stop codon(s) of the uORF. Empty, if no change or loss
(see Stop codon effects).
REF Kozak context
The reference Kozak context of the uORF.
Kozak effects
The direct effect(s) of the alternate allele(s) on the Kozak context of the
uORF. Either in the format [reference Kozak strength]->[alternative Kozak strength]
or "altered sequence" for variants that affect the Kozak sequence, but not its strength.
If the start codon is lost, the Kozak effect is "loss".
ALT Kozak contexts
The alternate Kozak context(s) of the uORF. Empty, if no change or loss
(see Kozak effects).
Sequence effects
The direct effect(s) of the alternate allele(s) on the uORF sequence. Variants that affect
the length of the sequence (e.g. loss of uSTOP) will cause one of the following sequence effects:
"longer sequence" or "shorter sequence". If the sequence is changed, but the length is not affected,
the effect will be "altered sequence". If the uSTART is lost, the sequence effect is "loss". As start
and stop condons are part of the sequence, a variant in these regions will always have a sequence
effect.
ALT nucleotide sequences
The alternate nucleotide sequence(s) of the uORF. By default, only the first
nucleotides are shown in the table. By clicking on each sequence, you can view
the whole sequence and optionally download it. The FASTA header has the following format:
">" + uORF ID + "_" + genomic position + "_variant_" + reference allele + "/" + alternate allele.
Empty, if no change or loss (see Sequence effects).
Locations
The location(s) of the variant in the transcript that harbors the uORF.
Alternate CDS distances [bp]
The new CDS distance(s) of the uORF caused by the variant allele(s).
Alternate uORF lengths [bp]
The new length(s) of the uORF caused by the variant allele(s).
dbSNP IDs
ID(s) of the reference variant(s) in dbSNP that is/are located at the same position
and has/have the same alleles.
Position-related variants in dbSNP
Contains a link to dbSNP for human variants. dbSNP is queried for
further variants at the current variant position (regardless of alleles).
ClinVar IDs
ID(s) of the reference variant(s) in ClinVar that is/are located at the same position
and has/have the same alleles.
Position-related variants in ClinVar
Contains a link to ClinVar for human variants. ClinVar is queried for
further variants at the current variant position (regardless of alleles).
Publications
Many of the following columns are not filled with text and/or numbers, but with
plus and minus. A plus indicates positive evidence for a feature and a minus
indicates negative evidence.
PubMed ID
The publications's PubMed ID. Clicking on the ID will take you to the publication
record in PubMed.
Authors
The author(s) of the publication.
Title
The title of the publication.
Taxa
The taxon or the taxa which are discussed in the publication.
Gene symbols
The gene symbol(s) for the gene(s) in the publication as provided by NCBI.
Gene name in paper
The name(s) of the gene(s) in the publication, if applicable. This name may be different
from the current nomenclature. It may also be a general term which sums up multiple
individual genes, for example "oncogenes".
Determinants of uORF presence or absence
Alternative promoters; Alternative splicing; Tissue-specific uORFs
Sequence analyses of the human transcriptome revealed that
about 50% of mRNAs contain one or more upstream AUGs (uAUGs)
between the 5'-cap-structure and the CDS. The general prevalence
of uAUG is, although higher than initially anticipated, still
lower than expected by normal distribution, arguing for an
evolutionary negative selection. For stochastic reasons, the
prevalence of uORFs increases with the length of the
5'-regulatory region, yet in specific cases the presence or
absence of one or several uORF(s) is dependent on the transcript
variant produced by transcription initiation from alternative
promoters or due to alternative splicing. Some of these variants
are specific to particular tissues.
Non-AUG uORFs
In a recent study
using global translational initiation
sequencing, 54% of human transcripts displayed one or more
translational initiation site(s) preceding the
CDS. Surprisingly, about three-fourths of upstream translation
was initiated by near-cognate, non-AUG initiation codons,
further relativizing the classical
`first-AUG'-role. Nevertheless, uAUG codons appeared to be
functionally most effective in repressing CDS translation.
Structural and sequence-dependent uORF properties
Number; Length; Distance from 5'-cap; Distance from uORF-STOP to CDS; CDS-overlap
Many publications investigated the importance of structural
and sequence dependent uORF properties in mediating
translational regulation. The repression of downstream
translation appears to be positively correlated with the number
of uORFs per transcript, the length of the uORF, and the distance
between the 5'-cap structure and the uORF initiation
codon. Furthermore, translational repression correlates
negatively with the distance between the uORF-STOP and the CDS
initiation site and is even more profound, when the uORF
overlaps the CDS initiation codon. Taken together, current data
suggest a dynamic regulatory model, where indispensable
initiation cofactors detach gradually from ribosomes during the
elongation phase of uORF translation but may be reloaded to
allow reinitiation at the CDS.
RNA secondary structure
In eukaryotes long GC-rich transcript leader sequences tend to form
stable secondary structures that inhibit ribosome progression and
CDS translation. Similarly, specific secondary structures within or
in the surrounding of uORFs may affect translation efficiency.
Functional consequences of uORF-mediated translational control
CDS repression; CDS induction; Start site selection
Most uORFs analyzed to date repress translation of the
subsequent initiation site(s) and inhibit/diminish translation
of the main protein. Post-uORF initiation at the CDS initiation
codon may occur from leaky scanning of ribosomes across the uORF
start codon or from reinitiation, if the uORF stop codon
precedes the CDS. Despite of a generally repressive function on
downstream translation, several exceptions have been described
where translation of specific uORFs or a certain alignment of
subsequent uORFs mediates enhanced CDS initiation. Furthermore,
uORF-directed start site selection can result in the production
of N-terminally distinct protein isoforms that harbor unique
biological functions.
Nonsense-mediated decay; mRNA destabilization
Nonsense-mediated decay (NMD) of mRNA is activated when specific
cellular surveillance mechanisms detect premature termination of
protein translation. Such premature termination events may result
from the use of nonsense codons that arise in mature transcripts due
to mutations, incorrect splicing or aberrant initiation site
selection. uORFs have been suggested to induce NMD by conferring
additional termination codons to the 5'-leader sequence of certain
transcripts. Similarly, another mode of termination-dependent RNA
destabilization that is distinct and independent of the common NMD
pathway has been reported in yeast.
Artificial or mutational deletion of a uORF may result in increased
ribosome load on a given transcript associated with increased
translational activity. On the contrary, ribosome stalling at the
uORF termination codon or pausing of ribosomes on inhibitory uORF
structures may hamper CDS translation. Underlining the multiplicity
of uORF-mediated translational control mechanisms, certain uORFs
facilitate enhanced CDS translation by supporting a ribosome shunt
across a highly structured and inhibitory 5'-transcript leader
sequence.
Co-regulatory events affecting uORF functions
Kozak consensus sequence
Whether or not the ternary preinitiation complex recognizes an
AUG or non-AUG triplet as a translational initiation codon is
strongly influenced by the nucleotide context surrounding
it. The optimal surrounding sequence for initiation is the Kozak
consensus sequence. If the AUG codon is surrounded by a strong
context, virtually all scanning ribosomes recognize the start
codon and initiate translation. In an adequate or weak
surrounding, a number of ribosomes scan through the initiation
site and remain ready to recognize an initiation site located
further downstream. Since the quality of the Kozak consensus
sequence is not the only determinant of translation initiation
efficiency, the mere evaluation of the surrounding nucleotides
does not permit the precise prediction of initiation.
Translational status
Regulation through uORFs allows rapid integration of the overall
translation status of a cell to adjust the translation rates of
important regulatory proteins. The translational status is dependent
on extracellular signals, environmental conditions, and nutrient
supply and is mainly reflected by the abundance of initiation
co-factors required to form a functional preinitiation complex
(ternary complex). A number of studies in yeast and human
transcripts precisely analyzed uORF-mediated regulation under
changing translational conditions.
Termination context
The sequence context surrounding a uORF termination codon may
determine the reinitiation efficiency at downstream initiation
sites. In particular, stable interactions between the terminating
ribosome and the RNA, or stable base pairing of the RNA alone may
cause ribosome pausing or mediate premature mRNA decay.
Altering the RNA- or peptide-sequence of a uORF frequently affects
downstream translation. This suggests that either the uORF-encoded
peptide or a specific RNA sequence mediates interaction with a
co-factor or the translation machinery to regulate translation, or
that specific secondary structure is functionally important.
Medical impact
Disease-related uORFs; Acquired mutations/SNPs
A defect in uORF-mediated translational control can be
associated with the development of human disease. Despite of
only few unequivocal cases at this time, it is evident that uORF
mutations may be involved in a wide variety of diseases,
including malignancies, metabolic or neurologic disorders, and
inherited syndromes. Considering that many important regulatory
proteins, including cell surface receptors, tyrosine kinases,
and transcription factors, act in a dose-dependent fashion and
possess uORFs, a substantial number of as yet unexplained
pathologies might be traced back to uORF mutations altering
expression levels of such key regulatory genes.
Pathophysiological importance of uORFs has been demonstrated
by genetically altered mouse models. Recent progress in
computational and sequencing based technologies, the development
of the ribosome profiling method, and mass spectrometry
approaches allow genome-wide studies of uORF function.
Methods; Review
Rather than describing individual transcripts, part of the
bibliography on uORFs focuses on methods for their study or
reviews particular aspects of the field of uORF research.
Authors
Name
The name of the author.
# Publications
The number of publications from this author in our database.
Methods and versions
The following sections will briefly list the versions of all major data types in the database and provide a very brief overview of our methods.
Publications and authors
We regularly scan PubMed for the latest publications in the field.
The version of the publication metadata (incl. author names) is the version that was available from PubMed at the time of the insert, but we
will regularly scan PubMed for updated metadata.
Taxonomy
We are using the NCBI taxonomy which we downloaded from the ftp server
on the 06.05.2022.
Genes
The version of a gene for a publication is the current version of that gene on NCBI at time the publication was inserted.
For genes with transcripts and/or uORFs, the gene metadata is based on the All_Data.gene_info.gz
and gene2refseq.gz files that we downloaded on the 04.05.2022.
uORFs and transcripts
uORFs predictions are based on the NM_* transcripts (in NCBI RefSeq Curated) and soft-masked genomes which we downloaded from
UCSC on the 07.04.2022.
We called uORFs for the following species and genome versions using custom scripts.
Homo sapiens
hg38
Drosophila melanogaster
dm6
Mus musculus
mm39
Danio rerio
danRer11
Rattus norvegicus
rn7
Bos taurus
bosTau9
Xenopus laevis
xenLae2
Xenopus tropicalis
xenTro10
Gallus gallus
galGal6
Sus scrofa
susSrc11
Pongo abelii
ponAbe3
Macaca mulatta
rheMac10
Pan troglodytes
panTro6
We then annotated uORFs with gene metadata using the files: All_Data.gene_info.gz
and gene2refseq.gz; downloaded on the 04.05.2022.
We filtered any transcripts and uORFs that belonged to pseudogenes or where the transcript accession was suppressed, withdrawn or renamed by NCBI. Also duplicate transcript
IDs were removed (e.g. in the pseudoautosomal regions).
Variants
We analyzed BAM files from cancer patients provided by the TCGA Research Network. We analyzed the following cancer types:
Breast Invasive Carcinoma (TCGA-BRCA), Colon Adenocarcinoma (TCGA-COAD), Acute Myeloid Leukemia (TCGA-LAML), Lung Adenocarcinoma (TCGA-LUAD), Prostate Adenocarcinoma
(TCGA-PRAD), and Skin Cutaneous Melanoma (TCGA-SKCM). WGS BAM files were downloaded from the
GDC Legacy Archive
between the 28.05.2021 and the 09.01.2022. We realigned the BAM files to GRCh38.p13 (downloaded 15.06.2021).
Realignment and subsequent quality control were performed according to a custom workflow heavily based on the
GDC DNA-Seq Analysis Pipeline. 677 patients remained and entered
the subsequent variant calling. Variants were called using Mutect2 (GATK v4.1.4.1, Java v1.10.11) with a custom pipeline based on the
GATK best practices.
PASS Variants were left-aligned, normalized, and annotated using custom scripts and BCFtools v1.11.
The annotation included dbSNP and ClinVar identifiers and allele frequencies from ExAC, gnomAD Genomes, and TopMed version 2 or 3. The metadata was based on
dbSNP (downloaded 13.08.2021) and
ClinVar VCFs (downloaded 13.01.2022), as well as
dbSNP JSON files (downloaded 23.02.2022). In the functional prediction, we chose to include only variants where
alternate and reference allele had the same length. This avoided frame shifts and issues with splicing sites. Prediction of the variant effect on the uORFs was performed using custom scripts.
Cite
Manske F, Ogoniak L, Jürgens L, Grundmann N, Makałowski W, Wethmar K.
The new uORFdb: integrating literature, sequence, and variation data in a central hub for uORF research.
Nucleic Acids Res. 2023 Jan 6;51(D1):D328-D336. doi: 10.1093/nar/gkac899.