TEclass2 classifies unknown transposable elements (TEs) consensus
sequences into 16 classes taken from Wicket et
al. classification system:
16 superfamilies model: Copia, Crypton, ERV, Gypsy, hAT, Helitron,
Jockey, L1_L2, Maverick, Merlin, SINE, P, Pao, RTE, TcMar and
Transib with a weighted average F1-score of 0.79.
Methods
The classification uses the deep learning model Transformers
(Vaswani et al. 2017) and outputs the softmax score (Goodfellow et
al. 2016) for each TE category that can be interpreted as
probabilities. The input sequences must be TE models from consensus
sequence in fasta format. You can either upload the file you want to
process, or paste the sequences directly. Note that the tool cannot
distinguish between TEs and non-TEs, thus every sequence will be
classified into one of categories even if it is not a TE.
2023-08-28 12:50