data cleaning
Type of resources
Keywords
Contact for the resource
status
Groups
-
The package provides features to manage the complete workflow for biodiversity data cleaning: uploading data, gathering input from users (in order to adjust cleaning procedures), cleaning data and finally, generating various reports and several versions of the data. It facilitates user-level data cleaning, designed for the inexperienced R users.
-
Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. The package includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore, it identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also, it implements an algorithm to identify datasets with a significant proportion of rounded coordinates. It is especially suited for large datasets. The reference for the methodology is: Zizka et al. (2019) https://doi.org/10.1111%2F2041-210X.13152
-
The package brings together several aspects of biodiversity data cleaning in one place. 'bdc' is organized in thematic modules related to different biodiversity dimensions, including: 1) Merge datasets: standardization and integration of different datasets; 2) Pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments; 3) Taxonomy: cleaning, parsing, and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; 4) Space: flagging of erroneous, suspect, and low-precision geographic coordinates; and 5) Time: flagging and, whenever possible, correction of inconsistent collection date. In addition, it contains features to visualize, document, and report data quality – which is essential for making data quality assessment transparent and reproducible. The reference for the methodology is Bruno et al. (2022) https://doi.org/10.1111%2F2041-210X.13868