Institute of Bioinformatics Münster
Usage of NanoPipe Webtool

To use the NanoPipe Webtool, you can upload a query file with Nanopore data and the results will be displayed, once the analysis is completed.

Enter Request Data

When you click on "Run the pipeline" in the left menu, you will be directed to a page where you can upload your data:

input

There are two ways to proceed:

  • you fill in the input fields for a new request, see below
  • you give the Job ID of a previous request to see its results (Look at a previous result), see below

Start a new analysis

To start a new Analysis, you need to choose the Discovery Task type ("Plasmodium Polymorphisms" or "Dengue Virus Serotype Classification", "Provide targt file") and upload a query file with nanopore data.
You can also perform an analysis on test data, when you click on "Run with test data**". This will run the analysis on a Nanopore dataset stored on our server, so that you can see what the result pages look like. See here for details.

Discovery tasks

Depending on the task you choose, different types of analyses will be performed. A specific target file for the alignment is used, as well as a specific parameter set for the alignments.
The available tasks are:

Please see the table below for the LAST parameters that are used to perform the alignemnts for the different discovery tasks.

Plasmodium polymorphisms

For the Plasmodium Polymorphisms task, the target file contains specific regions from 13 Plasmodium falciparum genes.
You can download the target sequences here.
The genes are:

gene sequence length
Pf3D7_05_v3:467406-468316:-:PfTCTP 911
M76611:3438-4680:+:apocytochrome_b 1243
Pf3D7_07_v3:404757-406466:-:PfCRT_2 1710
Pf3D7_07_v3:403089-404828:+:PfCRT_1 1740
Pf3D7_04_v3:747923-749956:+:DHFR-TS 2034
Pf3D7_01_v3:267134-269239:-:PfATPase6_1 2106
Pf3D7_13_v3:1724572-1727035:-:K13-propel 2464
Pf3D7_01_v3:464622-467289:+:pfmrp1_1 2668
Pf3D7_08_v3:548039-550780:+:DHPS 2742
Pf3D7_01_v3:264641-267470:+:PfATPase6_2 2830
Pf3D7_01_v3:466960-470216:-:pfmrp1_2 3257
Pf3D7_05_v3:957756-962218:+:pfmdr1 4463
Pf3D7_08_v3:670708-675573:-:ABC_transpor 4866

Dengue virus serotype classification

For the classification of dengue virus serotypes, the target file contains specific regions from four different dengue virus serotypes (serotypes 1, 2, 3 and 4).
You can download the target sequences here.

Provide target file

If you want to use your own target file, you can upload a fasta file containing nucleotide sequences. These will then be used to align the nanopore reads against.

Job title (optional)

If you provide a title for the job, the title is displayed in the results pages. The title appears in the Email notification, if you provided your Email address.

Query file

You need to upload a query file with the nanopore reads you want to analyse. Currently the maximum file size allowed is 100 MB.
The following formats are supported:

  • a file in fastq or fast5 format with nanopore reads
  • a compressed folder in *.zip or *.tar format containing files in fastq or fast5 format with nanopore reads. Please note, that the tool will look for the first folder it finds that contains fastq/fast5 files. It will only analyse the sequences from one folder within the archive!
For example, you could compress the "pass" folder generated by Metrichor into a *.zip file and then use it as a query file.

fast5 format

The fast5 format is the output format of the Metrichor software and the Oxford Nanopore MinION device. It not only contains the sequences, but also more details on the sequencing run. Therefore the file size can be relatively large.
To have a reduced file size, you can convert the fast5 files into fastq format, for example using poretools.

Your email address (optional)

If you provide your Email address, you will get a notification Email once the job is finished. This Email contains the Job Title, the Job ID and a link to the results.

Adjust LAST parameters (optional)

You can choose your own set of parameters that will be used to peform the LAST alignment. By default, optimized LAST parameters for the selected discovery task are used. These change, when you switch the discovery tasks.
The default LAST parameters are given in the following table:

"Plasmodium Polymorphisms" "Dengue Virus Serotype Classification" and "Provide Target File" LAST default
Match score (-r) based on Substitution matrix 6 6
Mismatch cost (-q) based on Substitution matrix 12 18
Gap existence cost (-a) 12 12 21
Gap extension cost (-b) 3 4 9
Insertion Existence Cost (-A) 15 15 (a)
Query Letters per Random Alignment (-D) 1e+07 1e+07 1e+06
* in NanoPipe, this parameter cannot be changed by the user

Substitution matrix (optional)

You can provide a DNA substitution matrix that will be used by LAST to perform the alignment. Instead of taking one value for all matches and mismatches, LAST will use the value from the substitution matrix for scoring.
The substitution matrix must be a tab-delimited text file and have 5 columns and rows. The first row and first column must contain the nucleotides A, C, G and T. Only integer values are allowed. They are allowed to be negative.

By default, the following matrix is used for the plasmodium polymorphisms task:

ACGT
A4-12-12-12
C-129-12-12
G-12-129-12
T-12-12-124

Match score (-r)

By default, this pipeline uses a match score of 6 for "Dengue virus serotype classification". For "Plasmodium polymorphisms", the match score is defined by a specific substitution matrix (4 for A/T, 9 for C/G). The LAST default match score is 6.

Mismatch cost (-q)

By default, this pipeline uses a mismatch cost of 12 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". The LAST default mismatch cost is 18.

Gap existence cost (-a)

By default, this pipeline uses a gap existence cost of 12 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". The LAST default gap existence cost is 21.

Gap extension cost (-b)

By default, this pipeline uses a gap extension cost of 3 for "Plasmodium polymorphisms" and 4 for "Dengue virus serotype classification". The LAST default gap extension cost is 9.

Insertion existence cost (-A)

By default, this pipeline uses a insertion existence cost of 15 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". By default, the LAST insertion existence cost is set to the same value as the gap existence cost (-a).

Query Letters per Random Alignment (-D)

By this option you can specify to report alignments that are expected by chance at most once per x (-D) query letters. By default, this pipeline uses a value of 1e+07 for "Plasmodium polymorphisms" and for "Dengue virus serotype classification". The default value used by LAST is 1e+06.

Submit request (Run)

You can choose between two types of submits (Run and Run with test data, see below). While waiting for completion of the analysis, a new screen whill appear, which updates itself every 60 seconds. After completion of the analysis, the results screen will appear. The pipeline takes around 1.5 minutes to complete for a query file with around 3.5 MB.

Run

When you select "Run", your files will be uploaded to our server and the analysis will be performed.

Run with test data (optional)

Instead of uploading your own query file, you can decide to use one of the test datasets with nanopore reads in fastq format. When you select "Run with test data", the analysis will be performed on a test dataset suited for the Discoevery task you have selected.
This option is not available if you select "provide target file".

Look at a previous result

When you run an analysis, you are provided with a 15-digit number (e.g. 143271850389848), the Job ID. This is used as an unique identifier to access your results. You can copy the ID into the Job ID field and will be redirected to the output for this job.
Old results are stored on our server for seven days, after which they will be deleted.

Results

The results are split over several pages, which are displayed in more detail here.

Overview

The Overview page gives the number of nanopore reads in the query file and the exact LAST parameters that were used to perform the alignments.

Read length distribution

The read length distrubution page displays the length of the nanopore reads that aligned to specific target sequences. The lengths are displayed in histogrammes, with one histogramme per target sequence.

Number of reads per gene

The "Number of reads per gene" page displays the number of nanopore reads that aligned to a specific target sequence. Each read is counted only once. Thus, if one read aligned to more than one target sequence, only its highest scoring alignment (based on the bitscore in the LAST output file) is used.

Polymorphisms

The "Polymorphisms" page gives tables of polymorphisms found in the nanopore reads. There is one table per target sequence.
In order to detect polymorphisms in the nanopore reads, the following approach is used: For each Nanopore read only the highest scoring pairwise alignment to a target sequence is chosen, the others are discarded. These alignments are then analysed to create a consensus sequence and to get the polymorphisms in the Nanopore reads.
To get the Polymorphisms, the nucleotides in the Nanopore reads at each position of the target sequences are counted. Then, for each position in the target sequence the most common nucleotide in all Nanopore reads is chosen as the consensus nucleotide. Whenever the consensus nucleotide is not the same as the one in the reference sequence at this position, it is declared as polymorphic.

Known polymorphisms (only for Plasmodium Polymorphisms)

The "Known polymorphisms" page contains tables with a subset of the "Polymorphisms" table, namely those that have additional information in the literature.
You can scroll the table horizontally.

Consensus sequences

The "Consensus sequences" page gives the consensus sequence for each target sequence. These sequences are generated based on the alignments created by LAST. For each position in the target sequence, the nucleotides in aligning nanopore reads are counted. The most common nucleotide at each position is then considered as the consensus. If there is more than one most common nucleotide IUPAC notation is used. InDels are not taken into consideration during the construction of the consesus sequences. There might be longer stretches of "N" (undetermined nucleotide) in these sequences in case there was no nanopore read that aligned to this part of the target sequence.

Alignments

The "Alignments" page displays alignments between the consensus sequences generated from the nanopore reads and the respective target sequences. There is one alignment per target sequence. These alignments are created with muscle.

Logo plots

The "Logo plots" page gives a list of links to pdf files that contain plots depicting the alignments of the nanopore reads to the target sequences.
Each vertical bar in the plot depicts one position in the target sequence. The bar's height reflects the number of nanopore sequences that aligned to this position. The bar's colours represent the nucleotides found in the aligning nanopore reads. Gray colour indicates that there was a gap in the nanopore read at this position in the target sequence.

(Extend duration)

This option is for internal use only.

2016-08-04 12:04