Description
TEclass classifies unknown transpsosable element (TE) consensus
sequences into four categories, according to their mechanism of
transposition: DNA transposons, LTRs, LINEs, SINEs. The
classification uses support vector machines
(
here),
random forests
(
here),
learning vector quantisation
(
here), and
also predicts ORFs
(
here). In
the current version the input sequences must be in fasta format. You
can either upload the file you want to process, or paste the
sequences directly. Note that the tool cannot distinguish betwen TEs
and non-TEs, thus every sequence will be classified into one of the
four categories (or, in ambiguous cases will be marked as unknown)
even if it is not a TE.
To start the newer version please click
here
TEclass2
Notes
-
TEclass is not a tool to annotate whole-genome data, thus it is
not a replacement for RepeatMasker or Censor. Its primary
purpose is to classify the repeat libraries which can
subsequently be used by these two tools. Thus, the input should
not contain more than a few thousand sequences, if you have
significantly more its a sign that you are almost certainly
using TEclass improperly.
-
The entered data must not exceed 1MB in size!
Methods
We analyze repeats in different size categories: 0-600 bp, 601-1800
bp, 1801-4000 bp, >4000 bp, and build independent classifiers for
all these length classes. We use libsvm as the SVM engine, with a
Gaussian kernel. The classification process is binary, with the
following steps: forward versus reverse sequence orientation > DNA
versus Retrotransposon > LTRs versus nonLTRs (for retroelements) >
LINEs versus SINEs (for nonLTR repeats). The last step is performed
only for repeats with lengths below 1800 bp, because we are not
aware of SINEs longer than 1800 bp. Separate classifiers were built
for each length class and for each classification step. If the
different methods of classification lead to conflicting results,
TEclass reports the repeat either as unknown, or as the last
category where the clasification methods are in agreement.
Download
Links
Tools for de novo reconstruction of repeat consensi
Tools for similarity based repeat identification:
Citation
Please cite Abrusan G, Grundmann N, DeMeester L, Makalowski W
2009. TEclass: a tool for automated classification of unknown
eukaryotic transposable elements. Bioinformatics
25:1329-1330
here
Credits
Please contact the
bioinformatics team or the author directly at
gyorgy01||gmail||com (replace || with the approprate signs) if you have
any questions. The classification tool was written by György
Abrusán and was funded by the Katholieke Universiteit Leuven,
Belgium (postdoctoral fellowship for G.A.) and the University of
Münster
2023-10-03 08:22