MetagenomicsDB documentation

The start page allows you to query the database. The menu on the left lets you access functions of the web site, regardless, if you are on the start page or not. The most important features that can be accessed from the menu are introduced in the following sections.

Starting with a query

The central entry point of the database is the "Query" page. Here, you can search for your topic of interest by selecting a search topic ("Patients", "Samples", and "Taxa") and then entering data in the search bar(s) and/or ticking the checkbox(es) (Figure 1). It is also possible to browse the database by submitting an empty search. The search topic does not only determine the searchable fields, but also which content you will see first after hitting the "Goto" button.

Figure 1:The search interface for "Patients".

The fields for free text search take different data types, as indicated by the letter directly in front of each field: "F" = floating point number, "I" = integer, and "T" = text. Text is automatically removed from fields of type "F" or "I". Integers and floats are allowed in fields of type "T", as these are interpreted as character varying. To do an advanced search, you have to enter special query terms: ">" = greater than, "<" = smaller than, ">=" = greater than or equal to, "<=" = less than or equal to, "," to separate multiple terms (internally "OR"), and "~" to conduct a pattern match. For example: type ">20 <40" in the search bar for "Mother's Age at Delivery" ("Patients") to view all children from mothers which are between 20 and 40 years old. Type "1,2" in the search bar for "Pregnancy Order" ("Patients") to see all first and second born children. "~Escherichia" in the search bar for "Taxon Name" ("Taxa") searches for all taxa whose names contain the term "Escherichia", e.g. Escherichia coli.
A counter in the bottom right corner of the search interface displays the number of matches for your search as you edit the query form. It dynamically updates when you leave a form field. The "Clear" button resets all fields. Please note that it is not possible to combine searches across search topics. So you cannot search for "Patient alias" ("Patients") and "Timepoint" ("Samples") at once. However, you can first search for a patient and restrict the time point in the results (see below).

Navigate in the search results

If you have chosen to use the search topic "Patients", you will be redirected to the patients view and see those patients that matched your query. We use the term view for a display of related items (patients, samples, taxa). You can still navigate to any other view in the database by using the buttons at the top of the page.

Figure 2: Controls in views at the example of the patients view.

You can select one or multiple rows by ticking the check boxes in front of the rows and then clicking "Apply" in the top panel. This will remove any other rows from your current view (use "Undo" to restore the display of rows). If you navigate to any other view, all items will be related to your selected rows. You also have designated buttons to tick all check boxes on the current page ("Page+") or on all pages ("All"), to uncheck all boxes ("None") or to uncheck all boxes on the current page ("Page-"). The number of checked boxes is shown next to these buttons.
On the top right, there are the pagination controls ("<<": first, "<": one page backward, ">": one page forward, ">>": last page), followed by the current page and total page count.
In the taxa view, click on the "?" to see all lineages that the current taxon belongs to (see also Column Descriptions) or use the ">>" buttons in the designated fields to query external resources.

Filters

The filters described in the following only affected the current view. Unlike the section of a row, they don't directly affect the other views! However, you can make a selection based on the filtered rows which will alter the display in the other views. For example, the goal is to find all patients that have samples containing Salmonella enterica at a "significant" frequency. First, use the search interface to search the taxa view for Salmonella enterica. In the view, you will see some low number hits. Using the filter panel (see below), you can now choose samples where the minimum read count for this taxon is 200. Filtering has now only affected the taxa view. Click "All" and then "Apply" to make a selection based on the filtered taxa. After switching to the patients view, you will only see those patients that matched your search and filter criteria.

Samples

Samples can be filtered using a minimum and/or maximum sequence count by entering the respective values in the "# Sequences" filter and clicking the "Filter" button. The "Clear" button removes the filter.

Taxa

Taxa can be filtered by rank, average read length and/or quality (minimum and/or maximum), and/or read count (minimum and/or maximum) using the filter panel at the top of the page as described for the samples.

Exports

Tick the check boxes in front of the entries that you wish to export. Select the desired export option from the drop-down menu at the top of the page and hit the "GO" button.

Sequences

Export sequences in FASTQ format for the selected samples (samples view). The current download limit is 800 MB. If you exceed the limit, a warning is displayed. Optionally, a reduced subset of samples matching the limit can be downloaded.

MicrobiomeAnalyst

Export the classifications and metadata for the selected patients (patients view) / samples (samples view) for analysis in the MicrobiomeAnalyst Marker Data Profiling workflow. A ZIP archive will be automatically downloaded to your computer and a tab with the upload page of the MicrobiomeAnalyst will open in your browser. Extract the archive and upload the files as shown in Figure 3. Please note that "Taxonomy labels" needs to be set to "Not Specific / Other".

Figure 3: Upload to the MicrobiomeAnalyst.

Depending on your sample/patient choice, a WARNING file can appear in the ZIP archive. It contains sample IDs which were not exported, as no classifications were available. Samples without (with incomplete) metadata, however, are exported and missing values are set to "NA". Please read the instructions on the first output page of the MicrobiomeAnalyst carefully. They explain, why not all of the exported metadata is visible inside the tool.
In order to comply with the formatting requirements of the MicrobiomeAnalyst, the taxonomy only includes the ranks "domain", "phylum", "class", "order", "family", "genus", and "species". If a sequence is not classified at the current rank and all following, the taxon is set to "NA" (database: "UNMATCHED") and classifications with no name at the current rank are indicated by the special taxon "NoName" (database: "NA").

Column descriptions

In the following you will find an in-depth description of all columns in the MetagenomicsDB web interface. The documentation is split in parts according to the views.

Patients

Patient alias	Pseudonym of the patient in the study.
Sex	Sex of the patient: "m" for male, "f" for female.
Pregnancy Order	The patient is the mother's nth child where n is the pregnancy order.
Birth Mode	How the patient was born: "natural" or "caesarean section".
Mother's Age at Delivery	The age of the mother in years when giving birth to the patient.
Mother's pre-Pregnancy BMI	The body mass index (BMI) of the mother before getting pregnant with the patient. The BMI is calculated as [weight in kg] / [height in m] ^ 2
Mother's pre-Pregnancy BMI Category	Rating of "Mother's pre-Pregnancy BMI": BMI <18.5: "underweight" BMI >=18.5 and <25: "normal weight" BMI >=25 and <30: "overweight" BMI >=30: "obesity"
Maternal Illness during Pregnancy	Mother's illness while being pregnant with the patient. One of: "diabetes", "thyroid disease", "hypertension", "diabetes + thyroid disease", "diabetes + hypertension", "thyroid disease + hypertension", "diabetes + thyroid disease + hypertension"
Maternal Antibiotics during Pregnancy	Did the mother receive antibiotics while pregnant with the patient: "yes", "no".
Difference in Body Mass at Delivery	The increase/decrease in body weight (kg) of the mother during pregnancy.
Category in Difference in Body Mass at Delivery	Rating of "Difference in Body Mass at Delivery" depending on "Mother's pre-Pregnancy BMI": "not enough": BMI <18.5 and weight difference <12.5 BMI >=18.5 and <25 and weight difference <11.5 BMI >=25.0 and <30.0 and weight difference <7.0 BMI >=30.0 and weight difference <5.0 "appropriate" BMI <18.5 and weight difference >= 12.5 and <=18.0 BMI >=18.5 and <25.0 and weight difference >=11.5 and <=16.0 BMI >=25.0 and <30 and weight difference >=7.0 and <=11.5 BMI >=30.0 and weight difference >=5.0 and <=9.0 "too much" BMI <18.5 weight difference >18.0 BMI >=18.5 and <25.0 and weight difference >16.0 BMI >=25.0 and lt;30.0 and weight difference >11.5 BMI >=30.0 and weight difference >9.0

Samples

Patient alias	See patients view.
Timepoint	The sampling time point relative to the date of birth. Abbreviations are: "d" = day(s), "w" = week(s), "m" = month(s), "y" = year(s).
Control	The sample is a water control sample: "yes", "no". Control samples don't have measurement values.
# Sequences	The number of sequences for a sample.
Weight-for-age category	Based on the sampling date, weight, sex, and the WHO weight-for-age metrics (l,m,s), a z-score is calculated according to the formulas provided in the WHO manual, page 302f. A child is appropriate for gestational age (AGA), if its z-score at meconium is ≥-2. A child is small for gestational age (SGA), if its z-score at meconium is <-2.
Weight-for-age sub-category	Related to SGA "Weight-for-age category", but provides an indication, if SGA children catched up in growth with their AGA peers. Catch-up only occurs from the second sample on, so at the first sample, children are classified according to "Weight-for-age category". Catch-up children must have a z-score > -2 and the difference between the minimum z-score at a previous measurement and the current z-score must be ≥ 0.67. If the catch-up event happened within the first 6 month, it was called "early catch-up", else "late catch-up". Prior to the catch-up event (if it happened at all), the sub-category was "no catch-up".
Feeding Mode	How the patient was fed: "breastfed", "formula", "mixed", or "diet extension"
Probiotics	Did the patient receive probiotics: "yes", "no".
Antibiotics	Did the patient receive antibiotics: "yes", "no".

Taxa

Patient alias	See patients view.
Timepoint	See samples view.
Control	See samples view.
Count	The number of reads in the given sample that support the respective taxon. Read counts are determined by taxon name and rank (irrespective of lineage). In some classification databases, the same taxon name is used multiple times at the same rank, even though the organisms come from different phylogenetic lineages (see also "Taxon Name").
Taxon Name	The name of the taxon. Use the "?" button to see, if taxa from different lineages share the same name at the given rank and thus contribute together to the read count for the taxon (see also "Count"). There are some special taxa: "UNMATCHED" means unclassified at this rank and all following; "FILTERED" reads were removed during the filtering of human contamination; "NA" are taxa with no name.
Rank	The taxonomic rank of the taxon.
Avg. read length	The average length of reads supporting this taxon.
Avg. read quality	The average quality of reads supporting this taxon.
Program	The program that was used to classify the reads.
Database	The classification database that was used with the "Program" to classify the reads.
PubMed	Conduct a search on PubMed with the current taxon and its connection to SGA/birth weight. The exact query is: "(TAXON NAME[Title/Abstract]) AND ((birth weight[Title/Abstract]) OR (SGA[Title/Abstract]) OR (small for gestational age[Title/Abstract])) AND ((newborn[Title/Abstract]) OR (infant[Title/Abstract]) OR (child[Title/Abstract]) OR (baby[Title/Abstract]))" . No link for special taxa (see "Taxon Name")
LPSN	Search for the taxon in the "List of Prokaryotic names with Standing in Nomenclature". No link for special taxa (see "Taxon Name").
BV-BRC	Search for the taxon on the homepage of the "Bacterial and Viral Bioinformatics Resource Center". No link for special taxa (see "Taxon Name").
Wikipedia	Search for the taxon on Wikipedia. No link for special taxa (see "Taxon Name").