• Metadata Catalogue
  •   Search
  •   Map

PEMA: a Pipeline for Environmental DNA Metabarcoding Analysis

PEMA is a HPC-centered, containerized assembly of key metabarcoding analysis tools. It supports the downstream analysis of four marker genes (16S/18S rRNA, ITS and COI) but also, by allowing the user to train the classifiers with custom reference databases, it can be used for further marker genes. By combining state-of-the art technologies and algorithms with an easy to get-set-use framework, PEMA allows researchers to tune thoroughly each study thanks to roll-back checkpoints and on-demand partial pipeline execution features.

Default

Date ( Creation)
2020-03-12
Date ( Publication)
2021-02-11
Date ( Revision)
2021-02-10
Status
Completed
Creator
  Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research - Haris Zafeiropoulos

Keywords

metagenomics

Keywords

remote sensing

Keywords

modelling

Keywords

e-DNA

Keywords

Metabarcoding

Keywords

16S

Keywords

18S

Keywords

ITS

Keywords

COI

Keywords

Marker gene analysis

Keywords

Taxonomy assignment

Access constraints
Copyright
OnLine resource
Paper describing the code (

WWW:LINK-1.0-http--related

)
OnLine resource
Home page/Github (

WWW:LINK-1.0-http--link

)
OnLine resource
Docker Hub (

WWW:LINK-1.0-http--link

)
OnLine resource
Singularity Hub (

WWW:LINK-1.0-http--link

)
Operation name

Sequence pre-processing

Web site

https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Description

FASTQC is used to obtain an overall read-quality summary.

Function

Sequence pre-processing

Operation name

Sequence pre-processing

Web site

http://www.usadellab.org/cms/?page=trimmomatic

Description

Trimmomatic is used for trimming steps.

Function

Sequence pre-processing

Operation name

Sequence pre-processing

Web site

https://cutadapt.readthedocs.io/en/stable/

Description

Cutadapt is used for ITS to address the variability in length of this marker gene.

Function

Sequence pre-processing

Operation name

Sequence pre-processing

Web site

http://cab.spbu.ru/software/spades/

Description

BayesHammer is taken from the SPAdes assembly toolkit to revise incorrectly-called bases.

Function

Sequence pre-processing

Operation name

Sequence pre-processing

Web site

https://github.com/neufeld/pandaseq

Description

PANDAseq assembles the overlapping paired-end reads.

Function

Sequence pre-processing

Operation name

Sequence pre-processing

Web site

https://pythonhosted.org/OBITools/welcome.html

Description

The “obiuniq” program of OBITools groups the identical sequences in every sample, keeping track of their abundances.

Function

Sequence pre-processing

Operation name

Sequence pre-processing

Web site

https://github.com/torognes/vsearch/releases/tag/v2.9.1

Description

The VSEARCH package is invoked for chimera removal.

Function

Sequence pre-processing

Operation name

OTU clustering

Web site

https://github.com/torognes/vsearch/releases/tag/v2.9.1

Description

VSEARCH is used for OTU clustering.

Function

OTU clustering

Operation name

OTU clustering

Web site

https://github.com/tingchenlab/CROP

Description

In case of COI marker genes COI, an unsupervised probabilistic Bayesian clustering algorithm (CROP) can be selected to perform the OTU clustering step.

Function

OTU clustering

Operation name

ASVs inference

Web site

https://github.com/torognes/swarm

Description

For all marker genes supported, PEMA invokes the Swarm V2 algorithm to infer ASVs.

Function

ASVs inference

Operation name

Taxonomy assignment

Web site

https://github.com/lanzen/CREST

Description

For the 16S/18S rRNA and ITS marker genes, the LCAClassifier algorithm of the CREST set of resources and tools is used together with the Silva and the Unite database. Two versions of Silva are included in PEMA: 128 and 132. Phylogeny-based assignment is also available for 16S rRNA marker gene data using a custom reference tree of 1,000 Silva-derived consensus sequences.

Function

Taxonomy assignment of the OTUs or ASVs returned in OTU clustering / ASVs inference step

Operation name

Taxonomy assignment

Web site

https://github.com/rdpstaff/classifier

Description

For the COI marker gene, PEMA supports the RDPClassifier and the Midori and Midori2 reference databases to assign taxonomy of the MOTUs.

Function

Taxonomy assignment of the OTUs or ASVs returned in OTU clustering / ASVs inference step

Operation name

Ecological downstream analysis

Web site

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0061217; http://joey711.github.io/phyloseq/index.html; https://cran.r-project.org/web/packages/vegan/index.html

Description

The phyloseq R package can be used for downstream ecological analysis of the taxonomically assigned OTUs or ASVs. This includes α- and β-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations between samples.

Function

Ecological downstream analysis of the taxonomic tables

Operation name

Alignment tool

Web site

https://mafft.cbrc.jp/alignment/software/

Description

For the alignment of the consensus sequences returned by the phat algorithm for building the reference tree. It is used in the PEMA framework every time a user asks for a phylogenetic tree with the OTUs/ASVs found.

Function

Alignment of consensus sequences

Operation name

Alignment tool

Web site

https://cme.h-its.org/exelixis/web/software/papara/index.html

Description

Alignment of short reads to reference phylogenies and alignments. In PEMA it aligns the OTUs/ASVs using the alignment of the sequences used for the reference tree as a core to align to.

Function

Alignment of short reads

Operation name

Build the reference tree

Web site

https://github.com/amkozlov/raxml-ng

Description

Build the reference tree, and as with MAFFT it is used to build a phylogeny tree based on the OTUs/ASVs retrieved if the user asks.

Function

Build the reference tree

Operation name

Sequence placement on a phylogenetic tree

Web site

https://github.com/Pbdas/epa-ng

Description

Performs maximum likelihood-based phylogenetic placement of genetic sequences on a user-supplied reference tree and alignment. In the PEMA framework it is used to assign OTUs/ASVs retrieved by PEMA to the reference tree.

Function

Maximum likelihood-based placement of genetic sequences on a reference tree

Operation name

PhAT algorithm

Web site

https://github.com/lczech/gappa/wiki/Subcommand:-phat

Description

Generates consensus sequences from a sequence database according to the PhAT method using the "gappa" package.

Function

Consensus sequences generator

Required Services

https://www.arb-silva.de/no_cache/download/archive/current/Exports/

Required Services

http://reference-midori.info/index.html

Required Services

https://unite.ut.ee

Service Category

data analysis

Service Category

data processing

Service Language
eng
Service TRL
TRL 9 – Actual system proven in operational environment

Metadata

File identifier
55dfbb63-4304-4fa8-a7b6-17988e0b33df XML
Metadata language
en
Hierarchy level
Service
Metadata Schema Version

1.0

 
 

Overviews

overview
remote sensing.jpg

Spatial extent

Keywords



Provided by

logo
Access to the portal
Read here the full details and access to the data.