• LifeWatch ERIC Metadata Catalogue
  •  
  •  
  •  

PEMA: a Pipeline for Environmental DNA Metabarcoding Analysis

PEMA is a HPC-centered, containerized assembly of key metabarcoding analysis tools. It supports the downstream analysis of four marker genes (16S/18S rRNA, ITS and COI) but also, by allowing the user to train the classifiers with custom reference databases, it can be used for further marker genes. By combining state-of-the art technologies and algorithms with an easy to get-set-use framework, PEMA allows researchers to tune thoroughly each study thanks to roll-back checkpoints and on-demand partial pipeline execution features.

Default

Identification

Date ( Creation )
2020-03-12
Date ( Publication )
2021-02-11
Date ( Revision )
2021-02-10
Status
Completed
Version
1.0
Keywords
e-DNA
Keywords
Metabarcoding
Keywords
16S
Keywords
18S
Keywords
ITS
Keywords
COI
Keywords
Marker gene analysis
Keywords
Taxonomy assignment
Access constraints
Copyright
Creator
  Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research - Haris Zafeiropoulos
 
OnLine resource
Paper describing the code ( WWW:LINK-1.0-http--related )
OnLine resource
Home page/Github ( WWW:LINK-1.0-http--link )
OnLine resource
Docker Hub ( WWW:LINK-1.0-http--link )
OnLine resource
Singularity Hub ( WWW:LINK-1.0-http--link )
Operation name
Sequence pre-processing
Web site
https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Description
FASTQC is used to obtain an overall read-quality summary.
Function
Sequence pre-processing
Operation name
Sequence pre-processing
Web site
http://www.usadellab.org/cms/?page=trimmomatic
Description
Trimmomatic is used for trimming steps.
Function
Sequence pre-processing
Operation name
Sequence pre-processing
Web site
https://cutadapt.readthedocs.io/en/stable/
Description
Cutadapt is used for ITS to address the variability in length of this marker gene.
Function
Sequence pre-processing
Operation name
Sequence pre-processing
Web site
http://cab.spbu.ru/software/spades/
Description
BayesHammer is taken from the SPAdes assembly toolkit to revise incorrectly-called bases.
Function
Sequence pre-processing
Operation name
Sequence pre-processing
Web site
https://github.com/neufeld/pandaseq
Description
PANDAseq assembles the overlapping paired-end reads.
Function
Sequence pre-processing
Operation name
Sequence pre-processing
Web site
https://pythonhosted.org/OBITools/welcome.html
Description
The “obiuniq” program of OBITools groups the identical sequences in every sample, keeping track of their abundances.
Function
Sequence pre-processing
Operation name
Sequence pre-processing
Web site
https://github.com/torognes/vsearch/releases/tag/v2.9.1
Description
The VSEARCH package is invoked for chimera removal.
Function
Sequence pre-processing
Operation name
OTU clustering
Web site
https://github.com/torognes/vsearch/releases/tag/v2.9.1
Description
VSEARCH is used for OTU clustering.
Function
OTU clustering
Operation name
OTU clustering
Web site
https://github.com/tingchenlab/CROP
Description
In case of COI marker genes COI, an unsupervised probabilistic Bayesian clustering algorithm (CROP) can be selected to perform the OTU clustering step.
Function
OTU clustering
Operation name
ASVs inference
Web site
https://github.com/torognes/swarm
Description
For all marker genes supported, PEMA invokes the Swarm V2 algorithm to infer ASVs.
Function
ASVs inference
Operation name
Taxonomy assignment
Web site
https://github.com/lanzen/CREST
Description
For the 16S/18S rRNA and ITS marker genes, the LCAClassifier algorithm of the CREST set of resources and tools is used together with the Silva and the Unite database. Two versions of Silva are included in PEMA: 128 and 132. Phylogeny-based assignment is also available for 16S rRNA marker gene data using a custom reference tree of 1,000 Silva-derived consensus sequences.
Function
Taxonomy assignment of the OTUs or ASVs returned in OTU clustering / ASVs inference step
Operation name
Taxonomy assignment
Web site
https://github.com/rdpstaff/classifier
Description
For the COI marker gene, PEMA supports the RDPClassifier and the Midori and Midori2 reference databases to assign taxonomy of the MOTUs.
Function
Taxonomy assignment of the OTUs or ASVs returned in OTU clustering / ASVs inference step
Operation name
Ecological downstream analysis
Web site
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217; http://joey711.github.io/phyloseq/index.html; https://cran.r-project.org/web/packages/vegan/index.html
Description
The phyloseq R package can be used for downstream ecological analysis of the taxonomically assigned OTUs or ASVs. This includes α- and β-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations between samples.
Function
Ecological downstream analysis of the taxonomic tables
Operation name
Alignment tool
Web site
https://mafft.cbrc.jp/alignment/software/
Description
For the alignment of the consensus sequences returned by the phat algorithm for building the reference tree. It is used in the PEMA framework every time a user asks for a phylogenetic tree with the OTUs/ASVs found.
Function
Alignment of consensus sequences
Operation name
Alignment tool
Web site
https://cme.h-its.org/exelixis/web/software/papara/index.html
Description
Alignment of short reads to reference phylogenies and alignments. In PEMA it aligns the OTUs/ASVs using the alignment of the sequences used for the reference tree as a core to align to.
Function
Alignment of short reads
Operation name
Build the reference tree
Web site
https://github.com/amkozlov/raxml-ng
Description
Build the reference tree, and as with MAFFT it is used to build a phylogeny tree based on the OTUs/ASVs retrieved if the user asks.
Function
Build the reference tree
Operation name
Sequence placement on a phylogenetic tree
Web site
https://github.com/Pbdas/epa-ng
Description
Performs maximum likelihood-based phylogenetic placement of genetic sequences on a user-supplied reference tree and alignment. In the PEMA framework it is used to assign OTUs/ASVs retrieved by PEMA to the reference tree.
Function
Maximum likelihood-based placement of genetic sequences on a reference tree
Operation name
PhAT algorithm
Web site
https://github.com/lczech/gappa/wiki/Subcommand:-phat
Description
Generates consensus sequences from a sequence database according to the PhAT method using the "gappa" package.
Function
Consensus sequences generator
Required Services
https://www.arb-silva.de/no_cache/download/archive/current/Exports/
Required Services
http://reference-midori.info/index.html
Required Services
https://unite.ut.ee
Service Category
data analysis
Service Language
eng
Service TRL
TRL 9 – Actual system proven in operational environment
 

Overviews

Spatial extent

Keywords


Provided by

logo

Share on social sites

Access to the portal
Read here the full details and access to the data.

Associated resources

Not available


  •  
  •  
  •