To use the NanoPipe Webtool, you can upload a query file with Nanopore data and the results will be displayed, once the analysis is completed.
When you click on "Run the pipeline" in the left menu, you will be directed to a page where you can upload your data:
There are two ways to proceed:
To start a new Analysis, you need to choose the Discovery Task type ("Plasmodium Polymorphisms" or "Dengue Virus Serotype Classification", "Provide targt file") and upload
a query file with nanopore data.
You can also perform an analysis on test data, when you
click on "Run with test data**". This will run the analysis on a Nanopore dataset stored
on our server, so that you can see what the result pages look like. See
here for details.
Depending on the task you choose, different types of analyses will be
performed. A specific target file for the alignment is used, as well
as a specific parameter set for the alignments.
The available tasks are:
Please see the table below for the LAST parameters that are used to perform the alignemnts for the different discovery tasks.
For the Plasmodium Polymorphisms task, the target file contains specific
regions from 13 Plasmodium falciparum genes.
You can download the target sequences here.
The genes are:
gene | sequence length |
---|---|
Pf3D7_05_v3:467406-468316:-:PfTCTP | 911 |
M76611:3438-4680:+:apocytochrome_b | 1243 |
Pf3D7_07_v3:404757-406466:-:PfCRT_2 | 1710 |
Pf3D7_07_v3:403089-404828:+:PfCRT_1 | 1740 |
Pf3D7_04_v3:747923-749956:+:DHFR-TS | 2034 |
Pf3D7_01_v3:267134-269239:-:PfATPase6_1 | 2106 |
Pf3D7_13_v3:1724572-1727035:-:K13-propel | 2464 |
Pf3D7_01_v3:464622-467289:+:pfmrp1_1 | 2668 |
Pf3D7_08_v3:548039-550780:+:DHPS | 2742 |
Pf3D7_01_v3:264641-267470:+:PfATPase6_2 | 2830 |
Pf3D7_01_v3:466960-470216:-:pfmrp1_2 | 3257 |
Pf3D7_05_v3:957756-962218:+:pfmdr1 | 4463 |
Pf3D7_08_v3:670708-675573:-:ABC_transpor | 4866 |
For the classification of dengue virus serotypes, the target file contains
specific regions from four different dengue virus serotypes (serotypes
1, 2, 3 and 4).
You can download the target sequences here.
If you want to use your own target file, you can upload a fasta file containing nucleotide sequences. These will then be used to align the nanopore reads against.
If you provide a title for the job, the title is displayed in the results pages. The title appears in the Email notification, if you provided your Email address.
You need to upload a query file with the nanopore reads you want to analyse.
Currently the maximum file size allowed is 100 MB.
The following formats are supported:
The fast5 format is the output format of the Metrichor software and the Oxford Nanopore MinION device. It not only contains the sequences, but also more details on the sequencing run.
Therefore the file size can be relatively large.
To have a reduced file size, you can convert the fast5 files into fastq
format, for example using poretools.
If you provide your Email address, you will get a notification Email once the job is finished. This Email contains the Job Title, the Job ID and a link to the results.
You can choose your own set of parameters that will be used to peform the
LAST alignment.
By default, optimized LAST parameters for the selected discovery task are used.
These change, when you switch the discovery tasks.
The default LAST parameters are given in the following table:
"Plasmodium Polymorphisms" | "Dengue Virus Serotype Classification" and "Provide Target File" | LAST default | |
---|---|---|---|
Match score (-r) | based on Substitution matrix | 6 | 6 |
Mismatch cost (-q) | based on Substitution matrix | 12 | 18 |
Gap existence cost (-a) | 12 | 12 | 21 |
Gap extension cost (-b) | 3 | 4 | 9 |
Insertion Existence Cost (-A) | 15 | 15 | (a) |
Query Letters per Random Alignment (-D) | 1e+07 | 1e+07 | 1e+06 |
You can provide a DNA substitution matrix that will be used by LAST to
perform the alignment. Instead of taking one value for all matches
and mismatches, LAST will use the value from the substitution matrix
for scoring.
The substitution matrix must be a tab-delimited text file and have
5 columns and rows. The first row and first column must contain
the nucleotides A, C, G and T. Only integer values are allowed.
They are allowed to be negative.
By default, the following matrix is used for the plasmodium polymorphisms task:
A | C | G | T | |
---|---|---|---|---|
A | 4 | -12 | -12 | -12 |
C | -12 | 9 | -12 | -12 |
G | -12 | -12 | 9 | -12 |
T | -12 | -12 | -12 | 4 |
By default, this pipeline uses a match score of 6 for "Dengue virus serotype classification". For "Plasmodium polymorphisms", the match score is defined by a specific substitution matrix (4 for A/T, 9 for C/G). The LAST default match score is 6.
By default, this pipeline uses a mismatch cost of 12 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". The LAST default mismatch cost is 18.
By default, this pipeline uses a gap existence cost of 12 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". The LAST default gap existence cost is 21.
By default, this pipeline uses a gap extension cost of 3 for "Plasmodium polymorphisms" and 4 for "Dengue virus serotype classification". The LAST default gap extension cost is 9.
By default, this pipeline uses a insertion existence cost of 15 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". By default, the LAST insertion existence cost is set to the same value as the gap existence cost (-a).
By this option you can specify to report alignments that are expected by chance at most once per x (-D) query letters. By default, this pipeline uses a value of 1e+07 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". The default value used by LAST is 1e+06.
You can choose between two types of submits (Run and Run with test data, see below). While waiting for completion of the analysis, a new screen whill appear, which updates itself every 60 seconds. After completion of the analysis, the results screen will appear. The pipeline takes around 1.5 minutes to complete for a query file with around 3.5 MB.
When you select "Run", your files will be uploaded to our server and the analysis will be performed.
Instead of uploading your own query file, you can decide to use
one of the test datasets with nanopore reads in fastq format.
When you select "Run with test data", the analysis will be
performed on a test dataset suited for the Discoevery task you
have selected.
This option is not available if you select "provide target file".
When you run an analysis, you are provided with a 15-digit number (e.g. 143271850389848),
the Job ID.
This is used as an unique identifier to access your results. You can copy the ID into
the Job ID field and will be redirected to the output for this job.
Old results are stored on our server for seven days, after which they will be deleted.
The results are split over several pages, which are displayed in more detail here.
The Overview page gives the number of nanopore reads in the query file and the exact LAST parameters that were used to perform the alignments.
The read length distrubution page displays the length of the nanopore reads that aligned to specific target sequences. The lengths are displayed in histogrammes, with one histogramme per target sequence.
The "Number of reads per gene" page displays the number of nanopore reads that aligned to a specific target sequence. Each read is counted only once. Thus, if one read aligned to more than one target sequence, only its highest scoring alignment (based on the bitscore in the LAST output file) is used.
The "Polymorphisms" page gives tables of polymorphisms found
in the nanopore reads. There is one table per target sequence.
In order to detect polymorphisms in the nanopore reads, the
following approach is used:
For each Nanopore read only the highest scoring pairwise alignment
to a target sequence is chosen, the others are discarded. These
alignments are then analysed to create a consensus sequence and
to get the polymorphisms in the Nanopore reads.
To get the Polymorphisms, the nucleotides in the Nanopore reads
at each position of the target sequences are counted. Then, for
each position in the target sequence the most common nucleotide
in all Nanopore reads is chosen as the consensus nucleotide.
Whenever the consensus nucleotide is not the same as the one
in the reference sequence at this position, it is declared as
polymorphic.
The "Known polymorphisms" page contains tables with a subset of the
"Polymorphisms" table, namely those that have additional information
in the literature.
You can scroll the table horizontally.
The "Consensus sequences" page gives the consensus sequence for each target sequence. These sequences are generated based on the alignments created by LAST. For each position in the target sequence, the nucleotides in aligning nanopore reads are counted. The most common nucleotide at each position is then considered as the consensus. If there is more than one most common nucleotide IUPAC notation is used. InDels are not taken into consideration during the construction of the consesus sequences. There might be longer stretches of "N" (undetermined nucleotide) in these sequences in case there was no nanopore read that aligned to this part of the target sequence.
The "Alignments" page displays alignments between the consensus sequences generated from the nanopore reads and the respective target sequences. There is one alignment per target sequence. These alignments are created with muscle.
The "Logo plots" page gives a list of links to pdf files that
contain plots depicting the alignments of the nanopore reads
to the target sequences.
Each vertical bar in the plot depicts one position in the target
sequence. The bar's height reflects the number of nanopore sequences
that aligned to this position. The bar's colours represent the
nucleotides found in the aligning nanopore reads. Gray colour indicates
that there was a gap in the nanopore read at this position in the
target sequence.
This option is for internal use only.