(2024) Development of a new data management system for the
study of the gut microbiome of children who are small for
their gestational age
Microbiome studies aim to answer the following questions:
which organisms are in the sample and what is their impact on
the patient or the environment? To answer these questions,
investigators have to perform comparative analyses on their
classified sequences based on the collected metadata, such as
treatment, condition of the patient, or the environment. The
integrity of sequences, classifications, and metadata is
paramount for the success of such studies. Still, the area of
data management for the preliminary study results appears to
be neglected.
(2024) NewickTreeModifier: a simple web page to prune and
modify Newick trees
Large-scale selection analyses of protein-coding sequences and
phylogenetic tree reconstructions require suitable trees in
Newick format. We developed the NewickTreeModifier, a simple
web-based tool to trim and modify Newick trees for such
analyses. The users can choose provided master trees or upload
a tree to prune it to selected species provided in FASTA,
NEXUS, or PHYLIP sequence format with an internal converter, a
simple species list, or directly determined from a checklist
interface of the master trees. Plant, insect, and vertebrate
master trees comprise the maximum number of species in an
up-to-date phylogenetic order directly transferable to the
pruned Newick outfile. NTM is available at
https://retrogenomics.uni-muenster.de/tools/ntm.
(2022) Comparison of circulating tumor cells and AR-V7 as
clinical biomarker in metastatic castration-resistant prostate
cancer patients
(2022) paPAML: An Improved Computational Tool to Explore
Selection Pressure on Protein-Coding Sequences
paPAML simplifies, amplifies, and accelerates selection
analyses via parallel processing, including detection of
negatively selected sites. paPAML compiles results of site,
branch, and branch-site models and detects site-specific
negative selection with the output of a codon list labelling
significance values. The tool simplifies selection analyses
for casual and inexperienced users and accelerates computing
speeds up to the number of allocated computer threads
(2022) The new uORFdb: integrating literature, sequence, and
variation data in a central hub for uORF research
Upstream open reading frames (uORFs) are initiated by AUG or
near-cognate start codons and have been identified in the
transcript leader sequences of the majority of eukaryotic
transcripts. Functionally, uORFs are implicated in downstream
translational regulation of the main protein coding sequence
and may serve as a source of non-canonical peptides. Genetic
defects in uORF sequences have been linked to the development
of various diseases, including cancer. To simplify
uORF-related research, the initial release of uORFdb in 2014
provided a comprehensive and manually curated collection of
uORF- elated literature. Here, we present an updated
sequence-based version of uORFdb, accessible at
https://www.bioinformatics.uni-muenster.de/tools/uorfdb. The
new uORFdb enables users to directly access sequence
information, graphical displays, and genetic variation data
for over 2.4 million human uORFs. It also includes sequence
data of >4.2 million uORFs in 12 additional species. Multiple
uORFs can be displayed in transcript- and
reading-frame-specific models to visualize the translational
context. A variety of filters, sequence-related information,
and links to external resources (UCSC Genome Browser, dbSNP,
ClinVar) facilitate immediate in-depth analysis of individual
uORFs. The database also contains uORF-related somatic
variation data obtained from whole-genome sequencing (WGS)
analyses of 677 cancer samples collected by the TCGA
consortium.
(2022) paPAML: An Improved Computational Tool to Explore
Selection Pressure on Protein-Coding Sequences
Evolution is change over time. Although neutral changes
promoted by drift effects are most reliable for phylogenetic
reconstructions, selection-relevant changes are of only
limited use to reconstruct phylogenies. On the other hand,
comparative analyses of neutral and selected changes of
protein-coding DNA sequences (CDS) retrospectively tell us
about episodic constrained, relaxed, and adaptive
incidences. The ratio of sites with nonsynonymous (amino acid
altering) versus synonymous (not altering) mutations directly
measures selection pressure and can be analysed by using the
Phylogenetic Analysis by Maximum Likelihood (PAML) software
package. We developed a CDS extractor for compiling
protein-coding sequences (CDS-extractor) and parallel PAML
(paPAML) to simplify, amplify, and accelerate selection
analyses via parallel processing, including detection of
negatively selected sites. paPAML compiles results of site,
branch-site, and branch models and detects site-specific
negative selection with the output of a codon list labelling
significance values. The tool simplifies selection analyses
for casual and inexperienced users and accelerates computing
speeds up to the number of allocated computer threads. We then
applied paPAML to examine the evolutionary impact on a new
GINS Complex Subunit 3 exon, and neutrophil-associated as well
as lysin and apolipoprotein genes. Compared with codeml (PAML
version 4.9j) and HyPhy (HyPhy FEL version 2.5.26), all paPAML
test runs performed with 10 computing threads led to identical
selection pressure results, whereas the total selection
analysis via paPAML, including all model comparisons, was
about 3 to 5 times faster than the longest running codeml
model and about 7 to 15 times faster than the entire
processing time of these codeml runs.
(2021) Presence of CTCs and its prognostic potential compared
to AR-V7 expression in mCRPC undergoing androgen-deprivation
therapy.
(2020) GenoTypeMapper: graphical genotyping on genetic and
sequence-based maps - published by Plant Methods
(2020) The multi-comparative 2-n-way genome suite
To effectively analyze the increasing amounts of available
genomic data, improved comparative analytical tools that are
accessible to and applicable by a broad scientific community
are essential. We built the "2-n-way" software suite to
provide a fundamental and innovative processing framework for
revealing and comparing inserted elements among various
genomes. The suite is comprised of two user-friendly web-based
modules. The 2-way module generates pairwise whole-genome
alignments of target and query species. The resulting genome
coordinates of blocks (matching sequences) and gaps (missing
sequences) from multiple 2-ways are then transferred to the
n-way module and sorted into projects, where user-defined
coordinates from reference species are projected to the
block/gap coordinates of orthologous loci in query species to
provide comparative information about presence (blocks) or
absence (gaps) patterns of targeted elements over many entire
genomes and phylogroups. Thus, the "2-n-way" software suite is
ideal for performing multi-directional,
non-ascertainment-biased screenings to extract all possible
presence/absence data of user-relevant elements in orthologous
sequences. To highlight its applicability and versatility, we
used 2-n-way to expose ~100 lost introns in vertebrates,
analyzed thousands of potential phylogenetically informative
bat and whale retrotransposons, and novel human exons as well
as thousands of human polymorphic retrotransposons.
(2020) Combinatorial expression of androgen receptor splice
variants: No predictive value in castration-resistant prostate
cancer patients treated with enzalutamide (enza) or
abiraterone (abi)
(2020) MetaGenomic analysis of short and long reads
Identifying single organisms in environmental samples is one
of the key tasks of metagenomics. During the last few years,
third generation sequencing technologies have enabled
researchers to sequence much longer molecules, but at the
expense of sequencing accuracy. Thus, new algorithms needed to
be developed to cope with this new type of data. With this in
mind, we developed a tool called MetaG. An intuitive web
interface makes the software accessible to a vast range of
users, including those without extensive bioinformatic
expertise. Evaluation of MetaG’s performance showed that it
makes nearly perfect classifications of viral isolates using
simulated short and long reads. MetaG also outperformed
current state-of-the-art algorithms on data from targeted
sequencing of the 16S and 28S rRNA genes. Since MetaG’s output
is also supplemented with information about hosts and
antibiotic resistances of pathogens, we expect it to be
especially useful to the healthcare sector. Moreover, the
outstanding accuracy of the taxonomic assignments will make
MetaG a serious alternative for anyone working with
metagenomic sequences.
(2019) NanoPipe: a web server for nanopore MinION sequencing
data analysis - published in GigaScience, giy169, doi:
10.1093/gigascience/giy169
Freely available NanoPipe software allows effortless and
reliable analysis of MinION sequencing data for experienced
bioinformaticians, as well for wet-lab biologists with minimum
bioinformatics knowledge. Moreover, for the latter group, we
describe the basic algorithm necessary for MinION sequencing
analysis from the first to last step.
(2015) GPAC - Genome Presence/Absence Compiler: A Web
Application to comparatively visualize multiple genome-level
changes - published in Oxford Journals
Our understanding of genome-wide and comparative sequence
information has been broadened considerably by the databases
available from the University of California Santa Cruz (UCSC)
Genome Bioinformatics Department. In partic- ular, the
identification and visualization of genomic sequences, present
in some species but absent in others, led to fundamental
insights into gene and genome evolution. However, the UCSC
tools currently enable one to visualize orthologous genomic
loci for a range of species in only a single locus. For
large-scale comparative analyses of such presence/absence
patterns a multilocus view would be more desirable. Such a
tool would enable us to compare thou- sands of relevant loci
simultaneously and to resolve many different questions about,
for example, phylogeny, specific aspects of genome and gene
evolution, such as the gain or loss of exons and introns, the
emergence of novel transposed elements, nonprotein-coding
RNAs, and viral genomic particles. Here, we present the first
tool to facilitate the parallel analysis of thousands of
genomic loci for cross-species presence/absence patterns based
on multiway genome alignments. This genome presence/absence
compiler uses annotated or other compilations of coordinates
of genomic locations and compiles all presence/absence
patterns in a flexible, color-coded table linked to the
individual UCSC Genome Browser alignments. We provide examples
of the versatile information content of such a screening
system especially for 7SL-derived transposed elements, nuclear
mitochondrial DNA, DNA transposons, and miRNAs in primates
(2011) Origin of the 1918 pandemic H1N1 influenza A virus as
studied by codon usage patterns and phylogenetic analysis -
published in Advance November 10, 2010,
doi:10.1261/rna.2395211
Origin of the 1918 pandemic H1N1 influenza A virus as studied
by codon usage patterns and phylogenetic analysis DARISUREN
ANHLAN, 1 NORBERT GRUNDMANN, 2 WOJCIECH MAKALOWSKI,2 STEPHAN
LUDWIG, 1 and CHRISTOPH SCHOLTISSEK 3 1 Institute of Molecular
Virology (IMV), Centre of Molecular Biology of Inflammation
(ZMBE), University of Mu¨nster, 48149 Mu¨ nster, Germany 2
Institute of Bioinformatics, University of Mu¨nster, 48149
Mu¨nster, Germany 3 St. Jude Children’s Research Hospital,
Memphis, Tennessee 38105, USA ABSTRACT The pandemic of 1918
was caused by an H1N1 influenza A virus, which is a negative
strand RNA virus; however, little is known about the nature of
its direct ancestral strains. Here we applied a broad genetic
and phylogenetic analysis of a wide range of influenza virus
genes, in particular the PB1 gene, to gain information about
the phylogenetic relatedness of the 1918 H1N1 virus. We
compared the RNA genome of the 1918 strain to many other
influenza strains of different origin by several means,
including relative synonymous codon usage (RSCU), effective
number of codons (ENC), and phylogenetic relationship. We
found that the PB1 gene of the 1918 pandemic virus had ENC
values similar to the H1N1 classical swine and human viruses,
but different ENC values from avian as well as H2N2 and H3N2
human viruses. Also, according to the RSCU of the PB1 gene,
the 1918 virus grouped with all human isolates and
‘‘classical’’ swine H1N1 viruses. The phylogenetic studies of
all eight RNA gene segments of influenza A viruses may
indicate that the 1918 pandemic strain originated from a H1N1
swine virus, which itself might be derived from a H1N1 avian
precursor, which was separated from the bulk of other avian
viruses in toto a long time ago. The high stability of the
RSCU pattern of the PB1 gene indicated that the integrity of
RNA structure is more important for influenza virus evolution
than previously thought.
(2010) A Novel Web-Based TinT Application and the Chronology
of the Primate Alu Retroposon Activity
DNA sequences afford access to the evolutionary pathways of
life. Particularly mobile elements that constantly co-evolve
in genomes encrypt recent and ancient information of their
host’s history. In mammals there is an extraordinarily
abundant activity of mobile elements that occurs in a dynamic
succession of active families, subfamilies, types, and
subtypes of retroposed elements. The high frequency of
retroposons in mammals implies that, by chance, such elements
also insert into each other. While inactive elements are no
longer able to retropose, active elements retropose by chance
into other active and inactive elements. Thousands of such
directional, element-in-element insertions are found in
present-day genomes. To help analyze these events, we
developed a computational algorithm (Transpositions in
Transpositions, or TinT) that examines the different
frequencies of nested transpositions and reconstructs the
chronological order of retroposon activitie
(2009) TEclass: a tool for automated classification of unknown
eukaryotic transposable elements
The large number of sequenced genomes required the development
of software that reconstructs the consensus sequences of
transposons and other repetitive elements. However, the
available tools usually focus on the accurate identification of
raw repeats and provide no information about the taxonomic
position of the reconstructed consensi. TEclass is a tool to
classify unknown transposable elements into their four main
functional categories, which reflect their mode of
transposition: DNA transposons, long terminal repeats (LTRs),
long interspersed nuclear elements (LINEs) and short
interspersed nuclear elements (SINEs). TEclass uses machine
learning support vector machine (SVM) for classification based
on oligomer frequencies. It achieves 90–97% accuracy in the
classification of novel DNA and LTR repeats, and 75% for LINEs
and SINEs.