PEMA: a Pipeline for Environmental DNA Metabarcoding Analysis
PEMA is a HPC-centered, containerized assembly of key metabarcoding analysis tools. It supports the downstream analysis of four marker genes (16S/18S rRNA, ITS and COI) but also, by allowing the user to train the classifiers with custom reference databases, it can be used for further marker genes. By combining state-of-the art technologies and algorithms with an easy to get-set-use framework, PEMA allows researchers to tune thoroughly each study thanks to roll-back checkpoints and on-demand partial pipeline execution features.
Default
- Date ( Creation)
- 2020-03-12
- Date ( Publication)
- 2021-02-11
- Date ( Revision)
- 2021-02-10
- Status
- Completed
- Keywords
-
metagenomics
- Keywords
-
remote sensing
- Keywords
-
modelling
- Keywords
-
e-DNA
- Keywords
-
Metabarcoding
- Keywords
-
16S
- Keywords
-
18S
- Keywords
-
ITS
- Keywords
-
COI
- Keywords
-
Marker gene analysis
- Keywords
-
Taxonomy assignment
- Access constraints
- Copyright
- OnLine resource
-
Paper describing the code
(
WWW:LINK-1.0-http--related
)
- OnLine resource
-
Home page/Github
(
WWW:LINK-1.0-http--link
)
- OnLine resource
-
Docker Hub
(
WWW:LINK-1.0-http--link
)
- OnLine resource
-
Singularity Hub
(
WWW:LINK-1.0-http--link
)
- Operation name
-
Sequence pre-processing
- Description
-
FASTQC is used to obtain an overall read-quality summary.
- Function
-
Sequence pre-processing
- Operation name
-
Sequence pre-processing
- Description
-
Trimmomatic is used for trimming steps.
- Function
-
Sequence pre-processing
- Operation name
-
Sequence pre-processing
- Description
-
Cutadapt is used for ITS to address the variability in length of this marker gene.
- Function
-
Sequence pre-processing
- Operation name
-
Sequence pre-processing
- Web site
- Description
-
BayesHammer is taken from the SPAdes assembly toolkit to revise incorrectly-called bases.
- Function
-
Sequence pre-processing
- Operation name
-
Sequence pre-processing
- Web site
- Description
-
PANDAseq assembles the overlapping paired-end reads.
- Function
-
Sequence pre-processing
- Operation name
-
Sequence pre-processing
- Description
-
The “obiuniq” program of OBITools groups the identical sequences in every sample, keeping track of their abundances.
- Function
-
Sequence pre-processing
- Operation name
-
Sequence pre-processing
- Description
-
The VSEARCH package is invoked for chimera removal.
- Function
-
Sequence pre-processing
- Operation name
-
OTU clustering
- Description
-
VSEARCH is used for OTU clustering.
- Function
-
OTU clustering
- Operation name
-
OTU clustering
- Web site
- Description
-
In case of COI marker genes COI, an unsupervised probabilistic Bayesian clustering algorithm (CROP) can be selected to perform the OTU clustering step.
- Function
-
OTU clustering
- Operation name
-
ASVs inference
- Web site
- Description
-
For all marker genes supported, PEMA invokes the Swarm V2 algorithm to infer ASVs.
- Function
-
ASVs inference
- Operation name
-
Taxonomy assignment
- Web site
- Description
-
For the 16S/18S rRNA and ITS marker genes, the LCAClassifier algorithm of the CREST set of resources and tools is used together with the Silva and the Unite database. Two versions of Silva are included in PEMA: 128 and 132. Phylogeny-based assignment is also available for 16S rRNA marker gene data using a custom reference tree of 1,000 Silva-derived consensus sequences.
- Function
-
Taxonomy assignment of the OTUs or ASVs returned in OTU clustering / ASVs inference step
- Operation name
-
Taxonomy assignment
- Description
-
For the COI marker gene, PEMA supports the RDPClassifier and the Midori and Midori2 reference databases to assign taxonomy of the MOTUs.
- Function
-
Taxonomy assignment of the OTUs or ASVs returned in OTU clustering / ASVs inference step
- Operation name
-
Ecological downstream analysis
- Web site
-
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217; http://joey711.github.io/phyloseq/index.html; https://cran.r-project.org/web/packages/vegan/index.html
- Description
-
The phyloseq R package can be used for downstream ecological analysis of the taxonomically assigned OTUs or ASVs. This includes α- and β-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations between samples.
- Function
-
Ecological downstream analysis of the taxonomic tables
- Operation name
-
Alignment tool
- Description
-
For the alignment of the consensus sequences returned by the phat algorithm for building the reference tree. It is used in the PEMA framework every time a user asks for a phylogenetic tree with the OTUs/ASVs found.
- Function
-
Alignment of consensus sequences
- Operation name
-
Alignment tool
- Description
-
Alignment of short reads to reference phylogenies and alignments. In PEMA it aligns the OTUs/ASVs using the alignment of the sequences used for the reference tree as a core to align to.
- Function
-
Alignment of short reads
- Operation name
-
Build the reference tree
- Web site
- Description
-
Build the reference tree, and as with MAFFT it is used to build a phylogeny tree based on the OTUs/ASVs retrieved if the user asks.
- Function
-
Build the reference tree
- Operation name
-
Sequence placement on a phylogenetic tree
- Web site
- Description
-
Performs maximum likelihood-based phylogenetic placement of genetic sequences on a user-supplied reference tree and alignment. In the PEMA framework it is used to assign OTUs/ASVs retrieved by PEMA to the reference tree.
- Function
-
Maximum likelihood-based placement of genetic sequences on a reference tree
- Operation name
-
PhAT algorithm
- Description
-
Generates consensus sequences from a sequence database according to the PhAT method using the "gappa" package.
- Function
-
Consensus sequences generator
- Required Services
-
https://www.arb-silva.de/no_cache/download/archive/current/Exports/
- Required Services
- Required Services
- Service Category
-
data analysis
- Service Category
-
data processing
- Service Language
- eng
- Service TRL
- TRL 9 – Actual system proven in operational environment
Metadata
- File identifier
- 55dfbb63-4304-4fa8-a7b6-17988e0b33df XML
- Metadata language
- en
- Hierarchy level
- Service
- Metadata Schema Version
-
1.0