<?xml version="1.0" encoding="UTF-8"?>
<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:gmx="http://www.isotc211.org/2005/gmx" xmlns:gts="http://www.isotc211.org/2005/gts" xmlns:gsr="http://www.isotc211.org/2005/gsr" xmlns:gmi="http://www.isotc211.org/2005/gmi" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://schemas.opengis.net/csw/2.0.2/profiles/apiso/1.0.0/apiso.xsd">
  <gmd:fileIdentifier>
    <gco:CharacterString>8970bd19-d5e0-4beb-b079-355aa8a2236b</gco:CharacterString>
  </gmd:fileIdentifier>
  <gmd:language>
    <gmd:LanguageCode codeList="http://www.loc.gov/standards/iso639-2/" codeListValue="eng" />
  </gmd:language>
  <gmd:hierarchyLevel>
    <gmd:MD_ScopeCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_ScopeCode" codeListValue="workflow" />
  </gmd:hierarchyLevel>
  <gmd:metadataStandardVersion>
    <gco:CharacterString>1.0</gco:CharacterString>
  </gmd:metadataStandardVersion>
  <gmd:identificationInfo>
    <gmd:MD_DataIdentification>
      <gmd:citation>
        <gmd:CI_Citation>
          <gmd:title>
            <gco:CharacterString>Exploring soil sample variability through Principal Component Analysis (PCA) using database-stored data</gco:CharacterString>
          </gmd:title>
          <gmd:date>
            <gmd:CI_Date>
              <gmd:date>
                <gco:DateTime>2023-12-31T00:00:00</gco:DateTime>
              </gmd:date>
              <gmd:dateType>
                <gmd:CI_DateTypeCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication" />
              </gmd:dateType>
            </gmd:CI_Date>
          </gmd:date>
        </gmd:CI_Citation>
      </gmd:citation>
      <gmd:abstract>
        <gco:CharacterString>This workflow focuses on analyzing diverse soil datasets using PCA to understand their physicochemical properties. It connects to a MongoDB database to retrieve soil samples based on user-defined filters. Key objectives include variable selection, data quality improvement, standardization, and conducting PCA for data variance and pattern analysis. The workflow generates graphical representations, such as covariance and correlation matrices, scree plots, and scatter plots, to enhance data interpretability. This facilitates the identification of significant variables, data structure exploration, and optimal component determination for effective soil analysis.&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;Background &lt;/div&gt;&lt;div&gt;- Understanding the intricate relationships and patterns within soil samples is crucial for various environmental and agricultural applications. Principal Component Analysis (PCA) serves as a powerful tool in unraveling the complexity of multivariate soil datasets. Soil datasets often consist of numerous variables representing diverse physicochemical properties, making PCA an invaluable method for: &lt;div&gt;∙Dimensionality    Reduction: Simplifying the analysis without compromising data integrity by reducing the dimensionality of large soil datasets. 
∙Identification    of    Dominant    Patterns: Revealing dominant patterns or trends within the data, providing insights into key factors contributing to overall variability. 
∙Exploration     of     Variable     Interactions: Enabling the exploration of complex interactions between different soil attributes, enhancing understanding of their relationships. 
∙Interpretability     of     Data     Variance: Clarifying how much variance is explained by each principal component, aiding in discerning the significance of different components and variables. 
∙Visualization     of     Data     Structure: Facilitating intuitive comprehension of data structure through plots such as scatter plots of principal components, helping identify clusters, trends, and outliers. 
∙Decision     Support     for     Subsequent     Analyses: Providing a foundation for subsequent analyses by guiding decision-making, whether in identifying influential variables, understanding data patterns, or selecting components for further modeling. &lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;Introduction&lt;/div&gt;&lt;div&gt;The motivation behind this workflow is rooted in the imperative need to conduct a thorough analysis of a diverse soil dataset, characterized by an array of physicochemical variables. Comprising multiple rows, each representing distinct soil samples, the dataset encompasses variables such as percentage of coarse sands, percentage of organic matter, hydrophobicity, and others. The intricacies of this dataset demand a strategic approach to preprocessing, analysis, and visualization. This workflow introduces a novel approach by connecting to a MongoDB, an agile and scalable NoSQL database, to retrieve soil samples based on user-defined filters. These filters can range from the natural site where the samples were collected to the specific date of collection. &lt;/div&gt;&lt;div&gt;Furthermore, the workflow is designed to empower users in the selection of relevant variables, a task facilitated by user-defined parameters. This flexibility allows for a focused and tailored dataset, essential for meaningful analysis. Acknowledging the inherent challenges of missing data, the workflow offers options for data quality improvement, including optional interpolation of missing values or the removal of rows containing such values. Standardizing the dataset and specifying the target variable are crucial, establishing a robust foundation for subsequent statistical analyses.&lt;/div&gt;&lt;div&gt;Incorporating PCA offers a sophisticated approach, enabling users to explore inherent patterns and structures within the data. The adaptability of PCA allows users to customize the analysis by specifying the number of components or desired variance. The workflow concludes with practical graphical representations, including covariance and correlation matrices, a scree plot, and a scatter plot, offering users valuable visual insights into the complexities of the soil dataset. &lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;Aims &lt;/div&gt;&lt;div&gt;The primary objectives of this workflow are tailored to address specific challenges and goals inherent in the analysis of diverse soil samples: &lt;/div&gt;&lt;div&gt;∙Connect    to    MongoDB    and  retrieve  data: Dynamically connect to a MongoDB database, allowing users to download soil samples based on user-defined filters. 
∙Variable    selection: Empower users to extract relevant variables based on user-defined parameters, facilitating a focused and tailored dataset. 
∙Data    quality    improvement: Provide options for interpolation or removal of missing values to ensure dataset integrity for downstream analyses. 
∙Standardization    and    target    specification: Standardize the dataset values and designate the target variable, laying the groundwork for subsequent statistical analyses. 
∙PCA: Conduct PCA with flexibility, allowing users to specify the number of components or desired variance for a comprehensive understanding of data variance and patterns. 
∙Graphical    representations: Generate visual outputs, including covariance and correlation matrices, a scree plot, and a scatter plot, enhancing the interpretability of the soil dataset. &lt;/div&gt;&lt;div&gt;&lt;br&gt;&lt;/div&gt;&lt;div&gt;Scientific questions&lt;/div&gt;&lt;div&gt;- This workflow addresses critical scientific questions related to soil analysis: &lt;/div&gt;&lt;div&gt;∙Facilitate Data Access: To streamline the retrieval of systematically stored soil sample data from the MongoDB database, aiding researchers in accessing organized data previously stored. 
∙Variable importance: Identify variables contributing significantly to principal components through the covariance matrix and PCA. 
∙Data structure: Explore correlations between variables and gain insights from the correlation matrix. 
∙Optimal component number: Determine the optimal number of principal components using the scree plot for effective representation of data variance. 
∙Target-related patterns: Analyze how selected principal components correlate with the target variable in the scatter plot, revealing patterns based on target variable values.&lt;/div&gt;&lt;/div&gt;</gco:CharacterString>
      </gmd:abstract>
      <gmd:status>
        <gmd:MD_ProgressCode codeListValue="onGoing" codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_ProgressCode" />
      </gmd:status>
      <gmd:pointOfContact>
        <gmd:CI_ResponsibleParty>
          <gmd:individualName>
            <gco:CharacterString>José Francisco Aldana Montes</gco:CharacterString>
          </gmd:individualName>
          <gmd:organisationName>
            <gco:CharacterString>University of Malaga</gco:CharacterString>
          </gmd:organisationName>
          <gmd:contactInfo>
            <gmd:CI_Contact>
              <gmd:address>
                <gmd:CI_Address>
                  <gmd:electronicMailAddress>
                    <gco:CharacterString>jfaldana@uma.es</gco:CharacterString>
                  </gmd:electronicMailAddress>
                </gmd:CI_Address>
              </gmd:address>
            </gmd:CI_Contact>
          </gmd:contactInfo>
          <gmd:role>
            <gmd:CI_RoleCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_RoleCode" codeListValue="principalInvestigator" />
            <!-- Lifewatch - removed author value-->
          </gmd:role>
        </gmd:CI_ResponsibleParty>
      </gmd:pointOfContact>
      <gmd:pointOfContact>
        <gmd:CI_ResponsibleParty>
          <gmd:individualName>
            <gco:CharacterString>Francisco Manuel SÁNCHEZ-CANO</gco:CharacterString>
          </gmd:individualName>
          <gmd:organisationName>
            <gco:CharacterString>LifeWatch ERIC ICT Core</gco:CharacterString>
          </gmd:organisationName>
          <gmd:contactInfo>
            <gmd:CI_Contact>
              <gmd:address>
                <gmd:CI_Address>
                  <gmd:electronicMailAddress>
                    <gco:CharacterString>franciscom.sanchez@lifewatch.eu</gco:CharacterString>
                  </gmd:electronicMailAddress>
                </gmd:CI_Address>
              </gmd:address>
            </gmd:CI_Contact>
          </gmd:contactInfo>
          <gmd:role>
            <gmd:CI_RoleCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_RoleCode" codeListValue="publisher" />
            <!-- Lifewatch - removed author-->
          </gmd:role>
        </gmd:CI_ResponsibleParty>
      </gmd:pointOfContact>
      <gmd:pointOfContact>
        <gmd:CI_ResponsibleParty>
          <gmd:individualName>
            <gco:CharacterString>Antonio José SÁENZ-ALBANÉS</gco:CharacterString>
          </gmd:individualName>
          <gmd:organisationName>
            <gco:CharacterString>LifeWatch ERIC ICT Core</gco:CharacterString>
          </gmd:organisationName>
          <gmd:contactInfo>
            <gmd:CI_Contact>
              <gmd:address>
                <gmd:CI_Address>
                  <gmd:electronicMailAddress>
                    <gco:CharacterString>aj.saenz@lifewatch.eu</gco:CharacterString>
                  </gmd:electronicMailAddress>
                </gmd:CI_Address>
              </gmd:address>
            </gmd:CI_Contact>
          </gmd:contactInfo>
          <gmd:role>
            <gmd:CI_RoleCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_RoleCode" codeListValue="custodian" />
            <!-- Lifewatch - removed author-->
          </gmd:role>
        </gmd:CI_ResponsibleParty>
      </gmd:pointOfContact>
      <gmd:pointOfContact>
        <gmd:CI_ResponsibleParty>
          <gmd:individualName>
            <gco:CharacterString>ICT Core Group</gco:CharacterString>
          </gmd:individualName>
          <gmd:organisationName>
            <gco:CharacterString>LifeWatch ERIC ICT Core</gco:CharacterString>
          </gmd:organisationName>
          <gmd:contactInfo>
            <gmd:CI_Contact>
              <gmd:address>
                <gmd:CI_Address>
                  <gmd:electronicMailAddress>
                    <gco:CharacterString>ict.coordination@lifewatch.eu</gco:CharacterString>
                  </gmd:electronicMailAddress>
                </gmd:CI_Address>
              </gmd:address>
            </gmd:CI_Contact>
          </gmd:contactInfo>
          <gmd:role>
            <gmd:CI_RoleCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_RoleCode" codeListValue="principalInvestigator" />
            <!-- Lifewatch - removed author-->
          </gmd:role>
        </gmd:CI_ResponsibleParty>
      </gmd:pointOfContact>
      <gmd:descriptiveKeywords>
        <gmd:MD_Keywords>
          <gmd:keyword>
            <gco:CharacterString>Soil sample variability</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Principal Component Analysis (PCA)</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Dimensionality reduction</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Data variance</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Soil datasets</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Physicochemical properties</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Data quality improvement</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>MongoDB database</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Covariance and correlation matrix</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Scree plot</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Scatter plot</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Multivariate analysis</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Standardization</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Target-related patterns</gco:CharacterString>
          </gmd:keyword>
          <gmd:keyword>
            <gco:CharacterString>Data structure exploration</gco:CharacterString>
          </gmd:keyword>
        </gmd:MD_Keywords>
      </gmd:descriptiveKeywords>
      <gmd:resourceConstraints>
        <gmd:MD_LegalConstraints>
          <gmd:accessConstraints>
            <gmd:MD_RestrictionCode codeListValue="copyright" codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#MD_RestrictionCode" />
          </gmd:accessConstraints>
          <gmd:useLimitation gco:nilReason="missing">
            <gco:CharacterString />
          </gmd:useLimitation>
          <gmd:otherConstraints>
            <gco:CharacterString>Copyright 2023 Khaos Research Group</gco:CharacterString>
          </gmd:otherConstraints>
        </gmd:MD_LegalConstraints>
      </gmd:resourceConstraints>
    </gmd:MD_DataIdentification>
  </gmd:identificationInfo>
  <gmd:workflow>
    <gmd:LW_Workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Load from DB</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>Download selected samples from DB</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-collection/LoadDBSoil/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Split dataset</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>Select the columns indicated by the user as parameters</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-processing/SplitDataset/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Interpolation</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>Interpolation of missing values</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-processing/DataframeInterpolation/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Delete NA</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>Drop rows with missing values</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-processing/DeleteNA/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Data Normalization</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>Standardization and normalization of data</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-processing/DataNormalization/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>PCA Soil</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>PCA of soil samples</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-analysing/PCAsoil/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>CSV top HTML</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>Transform a CSV table to HTML</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-sink/CorrelationMatrixHeatmap/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>PCA Plot</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>It represents a plot from the PCA CSV</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-sink/PCAplot/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Scree Plot</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>It represents the variance array of dataset variables</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-sink/ScreePlot/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:containServices_workflow>
        <gmd:LW_WorkflowContainServices>
          <gmd:serviceName_workflow>
            <gco:CharacterString>Correlation Matrix Heatmap</gco:CharacterString>
          </gmd:serviceName_workflow>
          <gmd:serviceDescription_workflow>
            <gco:CharacterString>It represents the correlation matrix of dataset variables</gco:CharacterString>
          </gmd:serviceDescription_workflow>
          <gmd:serviceReference_workflow>
            <gco:CharacterString>https://gitlab.lifewatch.dev/lfw002-khaos/wrapper-library/-/tree/develop/data-processing/CSV2HTML/1.0.0</gco:CharacterString>
          </gmd:serviceReference_workflow>
        </gmd:LW_WorkflowContainServices>
      </gmd:containServices_workflow>
      <gmd:workflowOtherInformation_workflow>
        <gmd:LW_workflowOtherInformation>
          <gmd:workflowHelpdesk_workflow>
            <gco:CharacterString>https://helpdesk.lifewatch.eu</gco:CharacterString>
          </gmd:workflowHelpdesk_workflow>
        </gmd:LW_workflowOtherInformation>
      </gmd:workflowOtherInformation_workflow>
    </gmd:LW_Workflow>
  </gmd:workflow>
  <gmd:distributionInfo>
    <gmd:MD_Distribution>
      <gmd:transferOptions>
        <gmd:MD_DigitalTransferOptions>
          <gmd:onLine>
            <gmd:CI_OnlineResource>
              <gmd:linkage>
                <gmd:URL />
              </gmd:linkage>
              <gmd:protocol>
                <gco:CharacterString>DOI</gco:CharacterString>
              </gmd:protocol>
              <gmd:applicationProfile gco:nilReason="missing">
                <gco:CharacterString />
              </gmd:applicationProfile>
              <gmd:name gco:nilReason="missing">
                <gco:CharacterString />
              </gmd:name>
              <gmd:description gco:nilReason="missing">
                <gco:CharacterString />
              </gmd:description>
              <gmd:function>
                <gmd:CI_OnLineFunctionCode codeList="http://standards.iso.org/iso/19139/resources/gmxCodelists.xml#CI_OnLineFunctionCode" codeListValue="" />
              </gmd:function>
            </gmd:CI_OnlineResource>
          </gmd:onLine>
        </gmd:MD_DigitalTransferOptions>
      </gmd:transferOptions>
    </gmd:MD_Distribution>
  </gmd:distributionInfo>
</gmd:MD_Metadata>

