data quality
Type of resources
Keywords
Contact for the resource
status
Groups
-
Automated assessment of accuracy and geographical status of georeferenced biological data. The methods rely on reference regions, namely checklists and range maps. The package includes functions to obtain data from the Global Biodiversity Information Facility (https://www.gbif.org) and from the Global Inventory of Floras and Traits (https://gift.uni-goettingen.de/home). Alternatively, the user can input their own data. Furthermore, it provides easy visualisation of the data and the results through the plotting functions. It is especially suited for large datasets. The reference for the methodology is: Arlé et al. https://doi.org/10.1111/2041-210X.13629
-
The package includes algorithms for presence-background (Maxent) and presence-absence (GAM, GLM, GBM, SVM, RF, ANN). Moreover, it contains functions for sampling bias correction, sampling pseudoabsences and background points, data partitioning, and reducing collinearity in predictors; fitting and evaluating models, ensembles of small models and ensemble models; models’ predictions, interpolation and overprediction correction.
-
The package brings together several aspects of biodiversity data cleaning in one place. 'bdc' is organized in thematic modules related to different biodiversity dimensions, including: 1) Merge datasets: standardization and integration of different datasets; 2) Pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments; 3) Taxonomy: cleaning, parsing, and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; 4) Space: flagging of erroneous, suspect, and low-precision geographic coordinates; and 5) Time: flagging and, whenever possible, correction of inconsistent collection date. In addition, it contains features to visualize, document, and report data quality – which is essential for making data quality assessment transparent and reproducible. The reference for the methodology is Bruno et al. (2022) https://doi.org/10.1111%2F2041-210X.13868