diff --git a/DESCRIPTION b/DESCRIPTION index 27186dd..b6f2d19 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -53,6 +53,6 @@ Suggests: vdiffr, withr Roxygen: list(markdown = TRUE) -RoxygenNote: 7.3.3 VignetteBuilder: knitr Config/testthat/edition: 3 +Config/roxygen2/version: 8.0.0 diff --git a/R/datasets.R b/R/datasets.R index 3ba1478..d4f9ba7 100644 --- a/R/datasets.R +++ b/R/datasets.R @@ -3,9 +3,9 @@ #' @description #' This dataset is derived from the WOODIV database (available at: #' \url{https://www.nature.com/articles/s41597-021-00873-3}). It contains the -#' grid cells of sites (10 km x 10 km horizontal +#' grid cells of sites (10 km x 10 km horizontal #' resolution) sampled in Portugal, Spain, France, and Italy (Mediterranean -#' part) for which at least one of the 24 Conifer tree species occurs. +#' part) for which at least one of the 24 conifer tree species occurs. #' #' This dataset exemplifies the argument `site_locations` used in #' several functions of `funbiogeo`. The variable `site` corresponds to the diff --git a/R/fb_aggregate_site_data.R b/R/fb_aggregate_site_data.R index 309d129..d89e3b7 100644 --- a/R/fb_aggregate_site_data.R +++ b/R/fb_aggregate_site_data.R @@ -8,24 +8,24 @@ #' #' @inheritParams fb_get_environment -#' @param site_data a `matrix` or `data.frame` containing values per sites to -#' aggregate along the provided grid. Can have one or several columns -#' (variables to aggregate). The first column must contain sites names as +#' @param site_data a `matrix` or `data.frame` containing values per site to +#' aggregate along the provided grid. It can contain one or several columns +#' (variables to aggregate). The first column must contain site names as #' provided in the first argument `site_locations`. #' #' @param agg_geom a `terra::SpatRaster` or an `sf` object. This defines the #' geometry along which to aggregate the initial data. See more in the Details #' section. #' -#' @param fun the function used to aggregate points values when there are +#' @param fun the function used to aggregate point values when there are #' multiple points in one cell. Default is `mean`. #' #' @param ... additional argument(s) passed to the provided function `fun` #' #' @details -#' The `agg_geom` object will condition the type of object ouput by the -#' function. It can be of any sort as an `SpatRaster` or `sf` object. Depending -#' on the need, it could be a regular square grid or hexagonal grid, it could +#' The `agg_geom` object will condition the type of object output by the +#' function. It can be of any sort of a `SpatRaster` or a `sf` object. Depending +#' on the need, it could be a regular square grid or hexagonal grid; it could #' also be irregular polygons like biomes or ecoregions, or points, and even #' lines (such as when aggregating across transects or trajectories). #' diff --git a/R/fb_count_sites_by_species.R b/R/fb_count_sites_by_species.R index c1e5c74..d0ebfa4 100644 --- a/R/fb_count_sites_by_species.R +++ b/R/fb_count_sites_by_species.R @@ -1,8 +1,8 @@ #' Count Number of Sites Occupied by Species #' #' @description -#' For each species computes the percentage of sites where the species is -#' present (distribution value higher than 0 and non-NA). +#' This function computes the number and proportion of sites occupied by each +#' species (distribution value higher than 0 and non-NA). #' #' @inheritParams fb_get_trait_coverage_by_site #' diff --git a/R/fb_count_species_by_site.R b/R/fb_count_species_by_site.R index 6953676..f00180f 100644 --- a/R/fb_count_species_by_site.R +++ b/R/fb_count_species_by_site.R @@ -1,9 +1,9 @@ #' Count Number of Species per Site #' #' @description -#' For each site computes the proportion of species present (distribution value -#' higher than 0 and non-NA) compared to all species provided. -#' For example, a site could contain only 20% of all species provided. +#' This function computes for each site the number and proportion of species +#' present (distribution value higher than 0 and non-NA) compared to all species +#' provided. For example, a site could contain only 20% of all species provided. #' #' @inheritParams fb_get_trait_coverage_by_site #' diff --git a/R/fb_count_species_by_trait.R b/R/fb_count_species_by_trait.R index 89dab7a..8903551 100644 --- a/R/fb_count_species_by_trait.R +++ b/R/fb_count_species_by_trait.R @@ -1,8 +1,8 @@ #' Count Number of Species for Each Trait #' #' @description -#' For each trait computes the percentage of species without `NA` (missing -#' trait values). +#' This function computes, for each trait, the number and proportion of species +#' without missing trait value (`NA`). #' #' @inheritParams fb_get_trait_coverage_by_site #' diff --git a/R/fb_count_traits_by_species.R b/R/fb_count_traits_by_species.R index a1c24f9..bb70a85 100644 --- a/R/fb_count_traits_by_species.R +++ b/R/fb_count_traits_by_species.R @@ -1,8 +1,8 @@ #' Compute Number of Known Trait(s) per Species #' #' @description -#' For each species computes the percentage of traits without `NA` (missing -#' trait values). +#' For each species, this function computes the number and proportion of traits +#' without `NA` (missing trait values). #' #' @inheritParams fb_get_trait_coverage_by_site #' diff --git a/R/fb_cwm.R b/R/fb_cwm.R index 17d521c..1532b11 100644 --- a/R/fb_cwm.R +++ b/R/fb_cwm.R @@ -3,7 +3,18 @@ #' This function returns the community-weighted mean of provided trait values. #' It only works with quantitative traits and will warn you otherwise. #' It will remove species that either have `NA` values in the `site_species` -#' input or `NA` values as their trait. +#' input or `NA` values as their trait in the provided trait object. +#' +#' The community-weighted mean is a site-based trait mean weighted by the +#' abundance of the species. It can be written with the following equation: +#' +#' \deqn{ +#' \text{CWM}_k = \sum_{i = 1}^S p_{ik} \times t_{ik} +#' } +#' +#' with \eqn{\text{CWM}_k} the CWM of site k, \eqn{p_{ik}} the relative +#' abundance of species \eqn{i} in site \eqn{k}, and \eqn{t_{ik}} the trait of +#' species \eqn{i} in site \eqn{k}. #' #' @inheritParams fb_get_trait_coverage_by_site #' diff --git a/R/fb_filter_sites_by_species_coverage.R b/R/fb_filter_sites_by_species_coverage.R index 1f548a9..647689d 100644 --- a/R/fb_filter_sites_by_species_coverage.R +++ b/R/fb_filter_sites_by_species_coverage.R @@ -1,15 +1,17 @@ #' Filter sites with a given species coverage threshold #' #' @description -#' Selects sites (rows) for which the percentage of present species -#' (distribution value higher than 0 and non-NA) is higher than a threshold. +#' Selects sites (rows) for which the proportion of species present +#' (distribution value higher than 0 and non-NA) is higher than the user-defined +#' threshold. #' #' @param threshold_species_proportion a numeric of length 1 between 0 and 1. -#' The percentage of species coverage threshold. +#' The threshold of species coverage under which to exclude the sites. #' #' @inheritParams fb_get_trait_coverage_by_site #' -#' @return A subset of `site_species` with sites covered by X% of species. +#' @return A subset of `site_species` with sites with at least +#' `threshold_species_proportion`% species present. #' #' @export #' diff --git a/R/fb_filter_sites_by_trait_coverage.R b/R/fb_filter_sites_by_trait_coverage.R index 34a767f..1bb82ce 100644 --- a/R/fb_filter_sites_by_trait_coverage.R +++ b/R/fb_filter_sites_by_trait_coverage.R @@ -1,7 +1,11 @@ #' Filter sites with a given trait coverage threshold #' #' @description -#' ... +#' Select sites (rows of the `site_species` object) with all given traits +#' available for at least the user-defined proportion of species +#' `threshold_traits_proportion`. If a single trait is given, then the threshold +#' applies to a single trait, if more than one trait is provided, then the +#' function considers a threshold across all traits taken together. #' #' @param threshold_traits_proportion a numeric of length 1 between 0 and 1. #' The percentage trait coverage threshold diff --git a/R/fb_filter_species_by_site_coverage.R b/R/fb_filter_species_by_site_coverage.R index df8c312..baf4b09 100644 --- a/R/fb_filter_species_by_site_coverage.R +++ b/R/fb_filter_species_by_site_coverage.R @@ -3,7 +3,7 @@ #' @description #' Selects species (columns) for which the percentage of sites where the #' species is present (distribution value higher than 0 and non-NA) is higher -#' than a threshold. +#' than a user-defined threshold. #' #' @param threshold_sites_proportion a numeric of length 1 between 0 and 1. #' The percentage of sites coverage threshold. diff --git a/R/fb_filter_species_by_trait_coverage.R b/R/fb_filter_species_by_trait_coverage.R index 986417f..59d4e8d 100644 --- a/R/fb_filter_species_by_trait_coverage.R +++ b/R/fb_filter_species_by_trait_coverage.R @@ -1,8 +1,10 @@ #' Filter species with a given traits coverage threshold #' #' @description -#' Selects species (rows) for which the percentage of traits without -#' `NA` (missing trait values) is higher than a threshold. +#' Selects species (rows) for which the proportion of traits without +#' `NA` (missing trait values) is higher than the user-defined threshold. +#' It considers as many traits as the ones provided to filter given +#' the threshold. #' #' @param threshold_traits_proportion a numeric of length 1 between 0 and 1. #' The percentage of traits coverage threshold. diff --git a/R/fb_filter_traits_by_species_coverage.R b/R/fb_filter_traits_by_species_coverage.R index 3378260..fc7691a 100644 --- a/R/fb_filter_traits_by_species_coverage.R +++ b/R/fb_filter_traits_by_species_coverage.R @@ -2,7 +2,7 @@ #' #' @description #' Selects traits (columns) for which the percentage of species without -#' `NA` (missing trait values) is higher than a threshold. +#' `NA` (missing trait values) is higher than a user-defined threshold. #' #' @param threshold_species_proportion `numeric(1)` \[default = `NULL`\]\cr{} #' between 0 and 1. The percentage of species coverage threshold. diff --git a/R/fb_format_site_locations.R b/R/fb_format_site_locations.R index 0af8cf6..e62a3dc 100644 --- a/R/fb_format_site_locations.R +++ b/R/fb_format_site_locations.R @@ -1,9 +1,10 @@ #' Extract site x locations information from long format data #' #' Convert a flat `data.frame` with site coordinates into a proper `sf` object -#' that can then be used by other functions. This function assumes that -#' the coordinates are given in WGS84 (longitude vs. latitude). The function -#' automatically removes repeated coordinates from the input dataset. +#' that can then be used by other functions. This function assumes by default +#' that the coordinates are given in WGS84 (longitude vs. latitude), but this +#' can be changed through the `crs` argument. The function automatically removes +#' repeated coordinates from the input dataset. #' #' @param data a `data.frame` in a long format (see example). #' diff --git a/R/fb_format_species_categories.R b/R/fb_format_species_categories.R index 2960e60..52062dc 100644 --- a/R/fb_format_species_categories.R +++ b/R/fb_format_species_categories.R @@ -1,8 +1,8 @@ #' Extract species x categories information from long format data #' #' Convert a flat `data.frame` with species names and species (supra-)category -#' (e.g. family, order, endemism status, etc.) into a proper `data.frame` object -#' that can then be used by other functions. +#' (e.g. family, order, endemism status, etc.) into a proper +#' `species_categories` object that can then be used by other functions. #' The final output contains species in rows and two columns (species name and #' species category). #' diff --git a/R/fb_format_species_traits.R b/R/fb_format_species_traits.R index 3b0f5a6..2311907 100644 --- a/R/fb_format_species_traits.R +++ b/R/fb_format_species_traits.R @@ -1,8 +1,8 @@ #' Extract species x traits information from long format data #' #' Convert a flat `data.frame` with traits values for different species -#' into a proper `data.frame` object that can then be used by other functions. -#' The final output contains species in rows and traits in columns. +#' into a proper `species_traits` object that can then be used by other +#' functions. The final output contains species in rows and traits in columns. #' #' @param data a `data.frame` in a long format (see example). #' diff --git a/R/fb_get_all_trait_coverages_by_site.R b/R/fb_get_all_trait_coverages_by_site.R index b9d2a20..c9da867 100644 --- a/R/fb_get_all_trait_coverages_by_site.R +++ b/R/fb_get_all_trait_coverages_by_site.R @@ -1,6 +1,6 @@ #' Compute Trait Coverage per Site for Each Trait #' -#' Compute trait coverage for all sites, i.e., the percentage of total +#' Compute trait coverage for all sites, i.e., the proportion of total #' abundance/presence of species that have traits data compared to total #' species. This function assumes that all species provided in the traits #' dataset have all their traits specified (meaning that all species have either diff --git a/R/fb_get_environment.R b/R/fb_get_environment.R index cbbe5bd..d68db32 100644 --- a/R/fb_get_environment.R +++ b/R/fb_get_environment.R @@ -1,6 +1,9 @@ #' Extract Raster Values at Location of Sites +#' +#' This function uses a `site_locations` object and a given `SpatRaster`. +#' It extracts the mean value of the provided raster for each site. #' -#' @param site_locations an `sf` object with the spatial geometries of sites. +#' @param site_locations a `sf` object with the spatial geometries of sites. #' **NOTE**: the first column should be named **`"site"`** #' and indicate site names. #' @@ -8,7 +11,7 @@ #' A single or multi-layers environmental raster. #' #' @return A `data.frame` with average environmental values (columns) per site -#' (rows), with the first column being `"site"` indicating site names. +#' (rows), with the first column being `"site"`, indicating site names. #' #' @export #' diff --git a/R/fb_get_trait_combination_coverage.R b/R/fb_get_trait_combination_coverage.R index 31b4f42..54bd3d0 100644 --- a/R/fb_get_trait_combination_coverage.R +++ b/R/fb_get_trait_combination_coverage.R @@ -1,13 +1,12 @@ #' Compute site trait coverage for each trait combination #' #' This function computes trait coverage for each site for different trait -#' combinations. If not provided, consider all possible trait combinations. #' The function will not run if the total number of combinations given is over #' 10,000. #' #' @inheritParams fb_get_trait_coverage_by_site #' @param comb_size an integer vector defining one or more sizes of combinations -#' (default: `NULL`) +#' over which to compute trait coverage. (default: `NULL`) #' #' @return #' a data.frame with the following columns: diff --git a/R/fb_get_trait_coverage_by_site.R b/R/fb_get_trait_coverage_by_site.R index 8bc0dfb..83d6f31 100644 --- a/R/fb_get_trait_coverage_by_site.R +++ b/R/fb_get_trait_coverage_by_site.R @@ -1,9 +1,8 @@ #' Compute Trait Coverage For Each Site Weighted by Abundance #' -#' Compute trait coverage for all sites, i.e., the percentage of total -#' abundance/presence of species that have traits data compared to total -#' species. -#' This function assumes that all species provided in the traits dataset have +#' Compute trait coverage for all sites, i.e., the proportion of total +#' abundance/presence of species that have traits data compared to all species. +#' This function assumes that all species provided in the trait dataset have #' all their traits specified (meaning that all species have either known or #' `NA` values reported as their traits). #' **NB**: this function returns trait coverage using all traits diff --git a/R/fb_make_report.R b/R/fb_make_report.R index 3fdc760..8fb47f2 100644 --- a/R/fb_make_report.R +++ b/R/fb_make_report.R @@ -1,8 +1,10 @@ #' Create an Rmarkdown Report to Explore User Data #' #' Creates an R Markdown (`.Rmd`) report from a template to explore and -#' summarize user data. User can modify this report and use the function -#' [rmarkdown::render()] (or click the _Render_ of the RStudio IDE) to convert +#' summarize user data in (functional) biogeography through the use of the +#' site-species, the species-traits, and the site-locations objects. Users can +#' modify this report and use the function [rmarkdown::render()] +#' (or click the _Render_ of the RStudio IDE) to convert #' this `.Rmd` in different formats: #' - HTML document (`output_format = "bookdown::html_document2"`); #' - PDF document (`output_format = "bookdown::pdf_document2"`); diff --git a/R/fb_map_raster.R b/R/fb_map_raster.R index 2dce170..2519641 100644 --- a/R/fb_map_raster.R +++ b/R/fb_map_raster.R @@ -1,4 +1,9 @@ #' Map a Single Raster Layer +#' +#' This is a helper function to plot a map of an environmental raster. +#' The raster is plotted as is, with its given coordinate reference system. +#' The function can provide a background map if the `background` argument is +#' toggled. #' #' @param x a `SpatRaster` object (package `terra`). A raster of one single #' layer diff --git a/R/fb_map_site_data.R b/R/fb_map_site_data.R index 77d458c..faf7bfa 100644 --- a/R/fb_map_site_data.R +++ b/R/fb_map_site_data.R @@ -1,9 +1,9 @@ #' Map Arbitrary Site Data #' -#' From the site-locations data and a dataset organized by site, plot a map of -#' this information. -#' The returned plot is as little customized as possible to let the user do -#' the customization. +#' This function helps to map arbitrary site data using the site-locations +#' object and a dataset organized by site. The returned plot is as little +#' customized as possible to let the user choose. The function can provide a +#' basic background map if the `background` argument is toggled. #' #' @param site_data `data.frame()` of additional site information containing #' the column `"site"` to merge with the `site_locations` argument diff --git a/R/fb_map_site_traits_completeness.R b/R/fb_map_site_traits_completeness.R index 3272cae..cdd48b3 100644 --- a/R/fb_map_site_traits_completeness.R +++ b/R/fb_map_site_traits_completeness.R @@ -3,7 +3,7 @@ #' Returns a `ggplot2` map of sites colored by trait coverage (proportion #' of species having a known trait value). By default shows one plot for each #' trait and add an additional facet named `"all_traits"` considering the -#' trait coverage with all traits taken together. +#' trait coverage with all provided traits taken together. #' #' @inheritParams fb_get_environment #' @inheritParams fb_get_all_trait_coverages_by_site diff --git a/man/fb_aggregate_site_data.Rd b/man/fb_aggregate_site_data.Rd index 8b34ff9..95ec544 100644 --- a/man/fb_aggregate_site_data.Rd +++ b/man/fb_aggregate_site_data.Rd @@ -7,20 +7,20 @@ fb_aggregate_site_data(site_locations, site_data, agg_geom, fun = mean, ...) } \arguments{ -\item{site_locations}{an \code{sf} object with the spatial geometries of sites. +\item{site_locations}{a \code{sf} object with the spatial geometries of sites. \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names.} -\item{site_data}{a \code{matrix} or \code{data.frame} containing values per sites to -aggregate along the provided grid. Can have one or several columns -(variables to aggregate). The first column must contain sites names as +\item{site_data}{a \code{matrix} or \code{data.frame} containing values per site to +aggregate along the provided grid. It can contain one or several columns +(variables to aggregate). The first column must contain site names as provided in the first argument \code{site_locations}.} \item{agg_geom}{a \code{terra::SpatRaster} or an \code{sf} object. This defines the geometry along which to aggregate the initial data. See more in the Details section.} -\item{fun}{the function used to aggregate points values when there are +\item{fun}{the function used to aggregate point values when there are multiple points in one cell. Default is \code{mean}.} \item{...}{additional argument(s) passed to the provided function \code{fun}} @@ -38,9 +38,9 @@ on it at a coarser scale, or you want to visualize it at that scale. This function helps you do exactly that. } \details{ -The \code{agg_geom} object will condition the type of object ouput by the -function. It can be of any sort as an \code{SpatRaster} or \code{sf} object. Depending -on the need, it could be a regular square grid or hexagonal grid, it could +The \code{agg_geom} object will condition the type of object output by the +function. It can be of any sort of a \code{SpatRaster} or a \code{sf} object. Depending +on the need, it could be a regular square grid or hexagonal grid; it could also be irregular polygons like biomes or ecoregions, or points, and even lines (such as when aggregating across transects or trajectories). } diff --git a/man/fb_count_sites_by_species.Rd b/man/fb_count_sites_by_species.Rd index 74a6f92..8cdb96a 100644 --- a/man/fb_count_sites_by_species.Rd +++ b/man/fb_count_sites_by_species.Rd @@ -20,8 +20,8 @@ A three-column \code{data.frame} with: } } \description{ -For each species computes the percentage of sites where the species is -present (distribution value higher than 0 and non-NA). +This function computes the number and proportion of sites occupied by each +species (distribution value higher than 0 and non-NA). } \examples{ site_coverage_by_species <- fb_count_sites_by_species(woodiv_site_species) diff --git a/man/fb_count_species_by_site.Rd b/man/fb_count_species_by_site.Rd index d4909d2..420f729 100644 --- a/man/fb_count_species_by_site.Rd +++ b/man/fb_count_species_by_site.Rd @@ -20,9 +20,9 @@ A three-column \code{data.frame} with: } } \description{ -For each site computes the proportion of species present (distribution value -higher than 0 and non-NA) compared to all species provided. -For example, a site could contain only 20\% of all species provided. +This function computes for each site the number and proportion of species +present (distribution value higher than 0 and non-NA) compared to all species +provided. For example, a site could contain only 20\% of all species provided. } \examples{ species_coverage_by_site <- fb_count_species_by_site(woodiv_site_species) diff --git a/man/fb_count_species_by_trait.Rd b/man/fb_count_species_by_trait.Rd index bf0bbd0..a439ad7 100644 --- a/man/fb_count_species_by_trait.Rd +++ b/man/fb_count_species_by_trait.Rd @@ -21,8 +21,8 @@ A three-column \code{data.frame} with: } } \description{ -For each trait computes the percentage of species without \code{NA} (missing -trait values). +This function computes, for each trait, the number and proportion of species +without missing trait value (\code{NA}). } \examples{ species_coverage_by_trait <- fb_count_species_by_trait(woodiv_traits) diff --git a/man/fb_count_traits_by_species.Rd b/man/fb_count_traits_by_species.Rd index 64da336..97f4431 100644 --- a/man/fb_count_traits_by_species.Rd +++ b/man/fb_count_traits_by_species.Rd @@ -22,8 +22,8 @@ species. } } \description{ -For each species computes the percentage of traits without \code{NA} (missing -trait values). +For each species, this function computes the number and proportion of traits +without \code{NA} (missing trait values). } \examples{ trait_coverage_by_species <- fb_count_traits_by_species(woodiv_traits) diff --git a/man/fb_cwm.Rd b/man/fb_cwm.Rd index 3245f05..4328092 100644 --- a/man/fb_cwm.Rd +++ b/man/fb_cwm.Rd @@ -28,7 +28,19 @@ A \code{data.frame} with sites in rows and the following variables: This function returns the community-weighted mean of provided trait values. It only works with quantitative traits and will warn you otherwise. It will remove species that either have \code{NA} values in the \code{site_species} -input or \code{NA} values as their trait. +input or \code{NA} values as their trait in the provided trait object. +} +\details{ +The community-weighted mean is a site-based trait mean weighted by the +abundance of the species. It can be written with the following equation: + +\deqn{ + \text{CWM}_k = \sum_{i = 1}^S p_{ik} \times t_{ik} +} + +with \eqn{\text{CWM}_k} the CWM of site k, \eqn{p_{ik}} the relative +abundance of species \eqn{i} in site \eqn{k}, and \eqn{t_{ik}} the trait of +species \eqn{i} in site \eqn{k}. } \examples{ site_cwm <- fb_cwm(head(woodiv_site_species), woodiv_traits) diff --git a/man/fb_filter_sites_by_species_coverage.Rd b/man/fb_filter_sites_by_species_coverage.Rd index 5a64672..01f44d6 100644 --- a/man/fb_filter_sites_by_species_coverage.Rd +++ b/man/fb_filter_sites_by_species_coverage.Rd @@ -15,14 +15,16 @@ fb_filter_sites_by_species_coverage( names. The other columns should be named according to species names.} \item{threshold_species_proportion}{a numeric of length 1 between 0 and 1. -The percentage of species coverage threshold.} +The threshold of species coverage under which to exclude the sites.} } \value{ -A subset of \code{site_species} with sites covered by X\% of species. +A subset of \code{site_species} with sites with at least +\code{threshold_species_proportion}\% species present. } \description{ -Selects sites (rows) for which the percentage of present species -(distribution value higher than 0 and non-NA) is higher than a threshold. +Selects sites (rows) for which the proportion of species present +(distribution value higher than 0 and non-NA) is higher than the user-defined +threshold. } \examples{ # Get sites with more than 40\% of the species diff --git a/man/fb_filter_sites_by_trait_coverage.Rd b/man/fb_filter_sites_by_trait_coverage.Rd index 6eabf63..4671925 100644 --- a/man/fb_filter_sites_by_trait_coverage.Rd +++ b/man/fb_filter_sites_by_trait_coverage.Rd @@ -28,7 +28,11 @@ A subset of \code{site_species} with sites covered by X\% of abundance/coverage considering all provided traits. } \description{ -... +Select sites (rows of the \code{site_species} object) with all given traits +available for at least the user-defined proportion of species +\code{threshold_traits_proportion}. If a single trait is given, then the threshold +applies to a single trait, if more than one trait is provided, then the +function considers a threshold across all traits taken together. } \examples{ # Filter all the sites where all species have known traits diff --git a/man/fb_filter_species_by_site_coverage.Rd b/man/fb_filter_species_by_site_coverage.Rd index 8282dbe..e0b477f 100644 --- a/man/fb_filter_species_by_site_coverage.Rd +++ b/man/fb_filter_species_by_site_coverage.Rd @@ -24,7 +24,7 @@ than \code{threshold_sites_proportion}. \description{ Selects species (columns) for which the percentage of sites where the species is present (distribution value higher than 0 and non-NA) is higher -than a threshold. +than a user-defined threshold. } \examples{ # Filter species present in at least 10\% of the sites diff --git a/man/fb_filter_species_by_trait_coverage.Rd b/man/fb_filter_species_by_trait_coverage.Rd index 8591313..5423f7e 100644 --- a/man/fb_filter_species_by_trait_coverage.Rd +++ b/man/fb_filter_species_by_trait_coverage.Rd @@ -22,8 +22,10 @@ The percentage of traits coverage threshold.} A subset of \code{species_traits} with species covered by X\% of traits. } \description{ -Selects species (rows) for which the percentage of traits without -\code{NA} (missing trait values) is higher than a threshold. +Selects species (rows) for which the proportion of traits without +\code{NA} (missing trait values) is higher than the user-defined threshold. +It considers as many traits as the ones provided to filter given +the threshold. } \examples{ # Filter species that have at least 60\% of the traits described diff --git a/man/fb_filter_traits_by_species_coverage.Rd b/man/fb_filter_traits_by_species_coverage.Rd index 2f3f852..f08013f 100644 --- a/man/fb_filter_traits_by_species_coverage.Rd +++ b/man/fb_filter_traits_by_species_coverage.Rd @@ -24,7 +24,7 @@ proportion of species. } \description{ Selects traits (columns) for which the percentage of species without -\code{NA} (missing trait values) is higher than a threshold. +\code{NA} (missing trait values) is higher than a user-defined threshold. } \examples{ # Filter traits that have at least 60\% non-missing values diff --git a/man/fb_format_site_locations.Rd b/man/fb_format_site_locations.Rd index 8c3300f..12f68cf 100644 --- a/man/fb_format_site_locations.Rd +++ b/man/fb_format_site_locations.Rd @@ -39,9 +39,10 @@ An \code{sf} object with a \code{site} column specifying site coordinates. } \description{ Convert a flat \code{data.frame} with site coordinates into a proper \code{sf} object -that can then be used by other functions. This function assumes that -the coordinates are given in WGS84 (longitude vs. latitude). The function -automatically removes repeated coordinates from the input dataset. +that can then be used by other functions. This function assumes by default +that the coordinates are given in WGS84 (longitude vs. latitude), but this +can be changed through the \code{crs} argument. The function automatically removes +repeated coordinates from the input dataset. } \examples{ filename <- system.file( diff --git a/man/fb_format_species_categories.Rd b/man/fb_format_species_categories.Rd index dcff91d..bd66f9c 100644 --- a/man/fb_format_species_categories.Rd +++ b/man/fb_format_species_categories.Rd @@ -21,8 +21,8 @@ species category). } \description{ Convert a flat \code{data.frame} with species names and species (supra-)category -(e.g. family, order, endemism status, etc.) into a proper \code{data.frame} object -that can then be used by other functions. +(e.g. family, order, endemism status, etc.) into a proper +\code{species_categories} object that can then be used by other functions. The final output contains species in rows and two columns (species name and species category). } diff --git a/man/fb_format_species_traits.Rd b/man/fb_format_species_traits.Rd index 2fcf04e..4829c2b 100644 --- a/man/fb_format_species_traits.Rd +++ b/man/fb_format_species_traits.Rd @@ -20,8 +20,8 @@ first column names \code{"species"} containing the species names. } \description{ Convert a flat \code{data.frame} with traits values for different species -into a proper \code{data.frame} object that can then be used by other functions. -The final output contains species in rows and traits in columns. +into a proper \code{species_traits} object that can then be used by other +functions. The final output contains species in rows and traits in columns. } \examples{ filename <- system.file( diff --git a/man/fb_get_all_trait_coverages_by_site.Rd b/man/fb_get_all_trait_coverages_by_site.Rd index 5cf9e4d..8446fca 100644 --- a/man/fb_get_all_trait_coverages_by_site.Rd +++ b/man/fb_get_all_trait_coverages_by_site.Rd @@ -32,7 +32,7 @@ an additional column named \code{all_traits} considering the coverage of all traits taken together. } \description{ -Compute trait coverage for all sites, i.e., the percentage of total +Compute trait coverage for all sites, i.e., the proportion of total abundance/presence of species that have traits data compared to total species. This function assumes that all species provided in the traits dataset have all their traits specified (meaning that all species have either diff --git a/man/fb_get_environment.Rd b/man/fb_get_environment.Rd index d12e977..24019c8 100644 --- a/man/fb_get_environment.Rd +++ b/man/fb_get_environment.Rd @@ -7,7 +7,7 @@ fb_get_environment(site_locations, environment_raster) } \arguments{ -\item{site_locations}{an \code{sf} object with the spatial geometries of sites. +\item{site_locations}{a \code{sf} object with the spatial geometries of sites. \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names.} @@ -16,10 +16,11 @@ A single or multi-layers environmental raster.} } \value{ A \code{data.frame} with average environmental values (columns) per site -(rows), with the first column being \code{"site"} indicating site names. +(rows), with the first column being \code{"site"}, indicating site names. } \description{ -Extract Raster Values at Location of Sites +This function uses a \code{site_locations} object and a given \code{SpatRaster}. +It extracts the mean value of the provided raster for each site. } \examples{ ## Import climate rasters ---- diff --git a/man/fb_get_trait_combination_coverage.Rd b/man/fb_get_trait_combination_coverage.Rd index e95a38f..5b7cd54 100644 --- a/man/fb_get_trait_combination_coverage.Rd +++ b/man/fb_get_trait_combination_coverage.Rd @@ -21,7 +21,7 @@ and contain species names. The other columns should be named according to trait names.} \item{comb_size}{an integer vector defining one or more sizes of combinations -(default: \code{NULL})} +over which to compute trait coverage. (default: \code{NULL})} } \value{ a data.frame with the following columns: @@ -36,7 +36,6 @@ trait combination and site. } \description{ This function computes trait coverage for each site for different trait -combinations. If not provided, consider all possible trait combinations. The function will not run if the total number of combinations given is over 10,000. } diff --git a/man/fb_get_trait_coverage_by_site.Rd b/man/fb_get_trait_coverage_by_site.Rd index d030f5f..62a82fd 100644 --- a/man/fb_get_trait_coverage_by_site.Rd +++ b/man/fb_get_trait_coverage_by_site.Rd @@ -22,10 +22,9 @@ two columns: \code{site}, the site label, and \code{trait_coverage}, the percent total abundance/presence of species that have traits data. } \description{ -Compute trait coverage for all sites, i.e., the percentage of total -abundance/presence of species that have traits data compared to total -species. -This function assumes that all species provided in the traits dataset have +Compute trait coverage for all sites, i.e., the proportion of total +abundance/presence of species that have traits data compared to all species. +This function assumes that all species provided in the trait dataset have all their traits specified (meaning that all species have either known or \code{NA} values reported as their traits). \strong{NB}: this function returns trait coverage using all traits diff --git a/man/fb_make_report.Rd b/man/fb_make_report.Rd index 1b1347c..ab4e515 100644 --- a/man/fb_make_report.Rd +++ b/man/fb_make_report.Rd @@ -43,7 +43,7 @@ to trait names.} \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names. The other columns should be named according to species names.} -\item{site_locations}{an \code{sf} object with the spatial geometries of sites. +\item{site_locations}{a \code{sf} object with the spatial geometries of sites. \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names.} @@ -63,8 +63,10 @@ No return value. } \description{ Creates an R Markdown (\code{.Rmd}) report from a template to explore and -summarize user data. User can modify this report and use the function -\code{\link[rmarkdown:render]{rmarkdown::render()}} (or click the \emph{Render} of the RStudio IDE) to convert +summarize user data in (functional) biogeography through the use of the +site-species, the species-traits, and the site-locations objects. Users can +modify this report and use the function \code{\link[rmarkdown:render]{rmarkdown::render()}} +(or click the \emph{Render} of the RStudio IDE) to convert this \code{.Rmd} in different formats: \itemize{ \item HTML document (\code{output_format = "bookdown::html_document2"}); diff --git a/man/fb_map_raster.Rd b/man/fb_map_raster.Rd index 6f469b3..3d21ee9 100644 --- a/man/fb_map_raster.Rd +++ b/man/fb_map_raster.Rd @@ -19,7 +19,10 @@ from Natural Earth.} A \code{ggplot} object. } \description{ -Map a Single Raster Layer +This is a helper function to plot a map of an environmental raster. +The raster is plotted as is, with its given coordinate reference system. +The function can provide a background map if the \code{background} argument is +toggled. } \examples{ library(ggplot2) diff --git a/man/fb_map_site_data.Rd b/man/fb_map_site_data.Rd index c8cc93e..75ad75d 100644 --- a/man/fb_map_site_data.Rd +++ b/man/fb_map_site_data.Rd @@ -7,7 +7,7 @@ fb_map_site_data(site_locations, site_data, selected_col, background = FALSE) } \arguments{ -\item{site_locations}{an \code{sf} object with the spatial geometries of sites. +\item{site_locations}{a \code{sf} object with the spatial geometries of sites. \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names.} @@ -23,10 +23,10 @@ from Natural Earth.} a \code{ggplot} object. } \description{ -From the site-locations data and a dataset organized by site, plot a map of -this information. -The returned plot is as little customized as possible to let the user do -the customization. +This function helps to map arbitrary site data using the site-locations +object and a dataset organized by site. The returned plot is as little +customized as possible to let the user choose. The function can provide a +basic background map if the \code{background} argument is toggled. } \examples{ site_rich <- fb_count_species_by_site(woodiv_site_species) diff --git a/man/fb_map_site_traits_completeness.Rd b/man/fb_map_site_traits_completeness.Rd index a21ae37..97ed877 100644 --- a/man/fb_map_site_traits_completeness.Rd +++ b/man/fb_map_site_traits_completeness.Rd @@ -13,7 +13,7 @@ fb_map_site_traits_completeness( ) } \arguments{ -\item{site_locations}{an \code{sf} object with the spatial geometries of sites. +\item{site_locations}{a \code{sf} object with the spatial geometries of sites. \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names.} @@ -40,7 +40,7 @@ a 'ggplot2' object Returns a \code{ggplot2} map of sites colored by trait coverage (proportion of species having a known trait value). By default shows one plot for each trait and add an additional facet named \code{"all_traits"} considering the -trait coverage with all traits taken together. +trait coverage with all provided traits taken together. } \examples{ # Map without a background diff --git a/man/fb_plot_site_environment.Rd b/man/fb_plot_site_environment.Rd index 3d7fd96..97db93a 100644 --- a/man/fb_plot_site_environment.Rd +++ b/man/fb_plot_site_environment.Rd @@ -12,7 +12,7 @@ fb_plot_site_environment( ) } \arguments{ -\item{site_locations}{an \code{sf} object with the spatial geometries of sites. +\item{site_locations}{a \code{sf} object with the spatial geometries of sites. \strong{NOTE}: the first column should be named \strong{\code{"site"}} and indicate site names.} diff --git a/man/funbiogeo-package.Rd b/man/funbiogeo-package.Rd index 98ecdf9..ac326ff 100644 --- a/man/funbiogeo-package.Rd +++ b/man/funbiogeo-package.Rd @@ -24,6 +24,7 @@ Useful links: Authors: \itemize{ + \item Nicolas Casajus \email{nicolas.casajus@fondationbiodiversite.fr} (\href{https://orcid.org/0000-0002-5537-5294}{ORCID}) [copyright holder] \item Matthias Grenié \email{matthias.grenie@univ-grenoble-alpes.fr} (\href{https://orcid.org/0000-0002-4659-7522}{ORCID}) } diff --git a/man/woodiv_locations.Rd b/man/woodiv_locations.Rd index 8c74b3d..23c87ea 100644 --- a/man/woodiv_locations.Rd +++ b/man/woodiv_locations.Rd @@ -14,9 +14,9 @@ woodiv_locations \description{ This dataset is derived from the WOODIV database (available at: \url{https://www.nature.com/articles/s41597-021-00873-3}). It contains the -grid cells of sites (10 km x 10 km horizontal +grid cells of sites (10 km x 10 km horizontal resolution) sampled in Portugal, Spain, France, and Italy (Mediterranean -part) for which at least one of the 24 Conifer tree species occurs. +part) for which at least one of the 24 conifer tree species occurs. This dataset exemplifies the argument \code{site_locations} used in several functions of \code{funbiogeo}. The variable \code{site} corresponds to the diff --git a/vignettes/diagnostic-plots.Rmd b/vignettes/diagnostic-plots.Rmd index 30c8833..cfdc001 100644 --- a/vignettes/diagnostic-plots.Rmd +++ b/vignettes/diagnostic-plots.Rmd @@ -22,8 +22,8 @@ metrics in other packages. This vignette explains in detail all plotting functions available in `funbiogeo`, how to use them and how to interpret them. -Some of these plotting functions use different data inputs and some of them can deal with species categorization, i.e. -displaying information by each category of species (family, order, endemism status, etc.). We detail the standard needed inputs in the table below, the "Additional input" column represents needed input that are not the other standard tables specified by the other columns. +Some of these plotting functions use different data inputs and some of them can deal with species categorization, i.e., +displaying information by each category of species (family, order, endemism status, etc.). We detail the standard needed inputs in the table below. The "Additional input" column represents needed input that is not the other standard tables specified by the other columns. ```{r child="../man/rmdchunks/_table-plot-functions.Rmd"} ``` @@ -46,30 +46,30 @@ data("woodiv_traits") ## Naming Convention Like in the rest of the `funbiogeo` package, the functions are named following -certain conventions. For one, to avoid any collision with other packages all the +certain conventions. For one, to avoid any collision with other packages, all the functions are prefixed with `fb_`. Second all plotting functions begin with either `fb_plot_*()`, when they are regular plots, or `fb_map_*()` when they plot maps. The function names in `funbiogeo` are generally long to be as specific and clear -as possible. So in case of doubt re-read the function name, and what it should +as possible. So in case of doubt, reread the function name, and what it should represent should be clear from the name. ## Regular Plots -In this section we will describe what we call "regular plots", i.e. +In this section we will describe what we call "regular plots," i.e., plots of non-spatial objects (density plots, bivariate plots, lollipop charts, heatmaps, etc.). We made this distinction because maps have their own specific challenges. All -the default regular plots proposed in `funbiogeo` are quite specific to the -data. In each of the section below we'll summarize what the plots is about, what +the default regular plots included in `funbiogeo` are quite specific to the +data. In each of the sections below, we'll summarize what the plot is about, what are the needed arguments, and how to interpret the output. -The function are all described in their specific subsections in alphabetical +The functions are all described in their specific subsections in alphabetical order. -### Distribution of trait coverages across sites +### Distribution of trait coverage across sites Visualizing the trait coverage of all sites can help isolate which traits may show consistently low coverage. Also this can help notice if some groups of @@ -78,7 +78,7 @@ sites have higher coverage for certain traits than for others. You can plot this using the `fb_plot_distribution_site_trait_coverage()` function. It takes two arguments: `site_species` the site by species `data.frame` and `species_traits` the species by traits `data.frame`. -The function leverages stacked density plots of trait coverages across sites +The function leverages stacked density plots of trait coverage across sites (nicknamed 'ridges'). They represent the distribution of trait coverage across all sites for each trait separately. @@ -95,35 +95,35 @@ fb_plot_distribution_site_trait_coverage(woodiv_site_species, woodiv_traits) We see the distribution of species coverage per site (along the x-axis) for each trait (along the y-axis) and with all traits taken together (shown at the top of -the top on the line `all_traits`). The proportions on the y-axis labels are the +the top of the line `all_traits`). The proportions on the y-axis labels are the average coverage observed for this trait. We see that we have very similar average coverage for all sites (average number of species for which we have not NA trait data), whatever the trait we consider, which is about 100% of species. This plot considers the distribution of species across sites instead of focusing -only on the species by traits `data.frame`. This maybe useful to realize, for -example, that the traits of a species occurring very rarely are missing this -wouldn't necessarily translate in low site-level trait coverage. +only on the species by traits `data.frame`. This may be useful to realize, for +example, that the traits of a species occurring are very rarely missing. This +wouldn't necessarily translate into low site-level trait coverage. -### Plotting the number of sites by species +### Plotting the Number of Sites by Species The function `fb_plot_number_sites_by_species()` allows to explore the site by species `data.frame`. It shows the number (and proportion) of sites occupied by -each species. Its first argument `site_species`, the site by species `data.frame` -is necessary while the second one `threshold_sites_proportion`, a target -proportion of sites coverage, is optional. +each species. Its first argument is `site_species`, the site by species `data.frame` +is necessary, while the second one `threshold_sites_proportion`, a target +proportion of site coverage, is optional. ```{r plot-sites-coverage-species} fb_plot_number_sites_by_species(woodiv_site_species) ``` The function outputs a dotchart. The number of occupied sites per species is -indicated at the bottom x-axis, while the top x-axis represents the proportion +indicated at the bottom x-axis, while the top x-axis represents the proportion of occupied sites. -The left y-axis label species names (in parenthesis) and their rank by -increasing prevalence. Note that for readability constraints, only a limited +The left y-axis labels species names (in parentheses) and their rank by +increasing prevalence. Note that, for readability constraints, only a limited number of species are labeled on the y-axis, but they are all displayed as dots on the plot. @@ -142,12 +142,12 @@ fb_plot_number_sites_by_species( The threshold bar helps us get a sense of how many species are present in at least 40% of the sites. It also displays the corresponding number of sites. -### Plotting the number of species per trait +### Plotting the Number of Species per trait One way to look at the species by traits `data.frame` is to look at number of species with non-missing trait values for each trait. The `fb_plot_number_species_by_trait()` function does exactly that. Its first -argument `species_traits`, the species by traits `data.frame`, is necessary +argument `species_traits`, the species by traits `data.frame`, is necessary, while the second one `threshold_species_proportion`, a target proportion of species coverage, is optional. @@ -155,7 +155,7 @@ species coverage, is optional. fb_plot_number_species_by_trait(woodiv_traits) ``` -The function outputs a lollipop chart. On the bottom x-axis there is the number +The function outputs a lollipop chart. On the bottom x-axis, there is the number of species covered by the given trait (the top x-axis represents the proportion of species, which is directly proportional). The y-axis represents each trait. The dot represents the actual coverage observed with the corresponding @@ -182,16 +182,16 @@ labels indicating the proportion and corresponding number of species (n = 22). You can also display the number of traits available per species. Showing it for each species would be quite difficult to read, so we decided instead to -represent the number of species having given number of known traits. Also, +represent the number of species having given the number of known traits. Also, because most trait ecologists are interested in multiple traits, we considered nested proportions of traits: considering the number of species covered by at least one trait, at least two, etc. -To represent such a plot you can use the function +To represent such a plot, you can use the function `fb_plot_number_traits_by_species()`, it uses two arguments. The first `species_traits` is the species by traits `data.frame`. The second argument is `threshold_species_proportion` which is optional and corresponds to a certain -threshold proportion of species, so that a line can be added to the plot. +threshold proportion of species so that a line can be added to the plot. Using it on the included dataset gives: @@ -204,16 +204,16 @@ the actual number of species, while top x-axis displays the corresponding proportion of species). The y-axis shows the number of each traits. Note that the categories are nested: the set of species with at least 1 trait contains the set of species with at least 2 traits, and so on and so forth. By definition all -species have 0 or more known traits, but we show that in the plot as a reference -to see how the proportion decreases with increasing number of traits. The -proportion is shown as text above each dot representing it. So for example, -there are 79.2% species with at least 3 non-missing traits. +species have 0 or better known traits, but we show that in the plot as a reference +to see how the proportion decreases with the increasing number of traits. The +proportion is shown as text above each dot representing it. So, for example, +there are 79.2% of species with at least 3 non-missing traits. This plot does not tell us if all species with at least 3 traits share the same trait combination (there are multiple 3 trait combinations), but it's a first indication. -Let's say we're interested in combinations of traits with at least 50% species +Let's say we're interested in combinations of traits with at least 50% of species covered. We can use the second argument to show it: ```{r plot-n-traits-species-thresh} @@ -223,23 +223,23 @@ fb_plot_number_traits_by_species( ``` Adding this argument displays a vertical bar at the target proportion of species -to easily target number of traits covering a certain proportion of species. The +to easily target the number of traits covering a certain proportion of species. The red dashed vertical line shows the corresponding species coverage with labels indicating the proportion and corresponding number of species (n = 12) -### Plotting environmental position of sites +### Plotting Environmental Position of Sites -To see if our sites are biased environmentally it can be nice to locate them -along environmental variables compared to a full region. For the sake of -simplicity we can focus on two variables against which to compare our sites to a +To see if our sites are biased environmentally, it can be nice to locate them +along with environmental variables compared to a full region. For the sake of +simplicity, we can focus on two variables against which to compare our sites to a region. That is what the `fb_plot_site_environment()` function does. It has four arguments: the first one, `site_locations`, provides the locations of sites as -an `sf` object, `environment_raster` is a -[`terra`](https://cran.r-project.org/package=terra) raster object, the next two +a `sf` object, `environment_raster` is a +[`terra`](https://cran.r-project.org/package=terra) raster object. The next two arguments are `first_layer` and `second_layer` which are the names of the two variables to be extracted from `environment_raster` to make our plot. -From the included dataset we can represent the first 6 sites along total annual +From the included dataset, we can represent the first 6 sites along total annual precipitation and mean annual temperature over the full region: ```{r} @@ -254,7 +254,7 @@ fb_plot_site_environment(head(woodiv_locations), layers) ``` The plot shows the first selected layer as the x-axis and the second one as the -y-axis. The environmental position of the sites are displayed using the big blue +y-axis. The environmental position of the sites is displayed using the big blue dots, while the light gray pixels are all the environmental variables extracted from the provided environmental raster. @@ -277,12 +277,12 @@ We can use it with the included dataset as an example: fb_plot_site_traits_completeness(woodiv_site_species, woodiv_traits) ``` -The plot shows the trait along the x-axis (and their average coverage across all +The plot shows the traits along the x-axis (and their average coverage across all sites in their labels) and sites along the y-axis. Each thin horizontal line represents a site. The color indicates the coverage for the trait in each -column. Note that for readability reasons the color scale has been discretized +column. Note that, for readability reasons, the color scale has been discretized from 0 to 100% coverage. Traits are ranked in decreasing average coverage. The -last column `all_traits` contains the coverage for all traits taken together. +last column `all_traits` contains the coverage of all traits taken together. With this figure we can see that all sites have over 95% coverage for all traits. @@ -303,8 +303,8 @@ fb_plot_species_traits_completeness(woodiv_traits) This figure visualizes directly the species by trait `data.frame`. The x-axis displays the different traits, ranked from left to right in decreasing coverage order (as indicated in the x-axis labels). The last column `all_traits` -considers all traits taken together. The y-axis represents species in decreasing -coverage order from bottom to top. Each cell thus represents the trait for a +considers all traits taken together. The y-axis represents species with decreasing +coverage order from bottom to top. Each cell thus represents the trait of a species: blue if the trait is known and red if it is missing. From this plot we see that a small proportion of species have missing @@ -320,23 +320,23 @@ from the previous section. It can be done through `fb_plot_species_traits_missingness()` which takes `species_traits`, the species by traits `data.frame`, for argument as well as `all_traits` to know if an additional row should be used to display a summary -for all traits taken together. +of all traits taken together. ```{r plot-species-traits-missingness} fb_plot_species_traits_missingness(woodiv_traits, all_traits = TRUE) ``` The plot displays the number of species with known and missing traits. It shows -each trait in separate line as a proportional bar chart with the total numbers +each trait in separate lines as a proportional bar chart with the total numbers included within each bar. -### Displaying traits combinations +### Displaying traits Combinations Looking at the traits coverage for each species continuously maybe impractical or difficult. For example, when trying to display thousands of species or when the trait coverage varies widely across species. One way to reduce the size of -the analyzed (and thus visualized) dataset is instead to count at which +the analyzed (and thus visualized) dataset is, instead, to count at which frequency appear the combinations of present/missing traits. This is done by `fb_plot_trait_combination_frequencies()` which takes two @@ -353,7 +353,7 @@ fb_plot_trait_combination_frequencies(woodiv_traits) The x-axis represents individual traits ranked in alphabetical order from left to right. The y-axis represents different combinations. The labels on the y-axis show both the number and the frequency of each combination. When the cell is -blue it means that the trait is present, when it is red it means the trait is +blue, it means that the trait is present; when it is red, it means the trait is missing. By default the combinations are ordered by increasing number from top to bottom, with the most numerous combinations of trait presences at the very bottom of the graph. @@ -363,7 +363,7 @@ number of species provided) with all their traits that are non-missing, while 1 species has a missing seed mass and a missing wood density. If we change `order_by` to `"complete"`, the combinations are ordered instead by -the number of trait presents among them: +the number of traits presented among them: ```{r plot-trait-comb-freq-order} fb_plot_trait_combination_frequencies(woodiv_traits, order_by = "complete") @@ -376,9 +376,9 @@ with most missing traits are at the top. ### Displaying traits correlations Most of the plot functions in `funbiogeo` show traits independently of one -another. However, for functional biogeography analyses, trait correlations maybe +another. However, for functional biogeography analyses, trait correlations are maybe very relevant. This is exactly what is done by `fb_plot_trait_correlation()`. It -takes as first needed argument `species_traits`, the species by traits +takes, as first required argument `species_traits`, the species by traits `data.frame`. The other arguments are optional and will be passed to `stats::cor()`. @@ -392,21 +392,21 @@ fb_plot_trait_correlation(woodiv_traits) ``` Both x- and y-axes represent the different traits. At their intersection are -shown squared that are colored in function of their correlation coefficient -(purple means close to -1 correlation while brown means close to 1, white means +shown squares that are colored in function of their correlation coefficient +(purple means close to -1 correlation, while brown means close to 1 and white means close to 0). The correlation coefficients are also displayed in the middle of this square. -With this visualization we can see that our traits are mostly unorrelated, there +With this visualization we can see that our traits are mostly uncorrelated, there is a slight negative correlation of plant and wood density (cor = -0.2). ## Maps Map functions in `funbiogeo` are here to provide good default visualization -leveraging the spatial information of sites. We know that producing map in R +leveraging the spatial information of sites. We know that producing maps in R is challenging. That's why we provide these helper functions. -Of course, these functions are all basics and you either have to customize them +Of course, these functions are all basic, and you either have to customize them by adding `ggplot2` commands to the returned plots, or to look at their code to produce similar plots in the way you want. @@ -454,11 +454,11 @@ arbitrary site data whether quantitative or qualitative with the `fb_map_site_data()` function. The first argument it takes is `site_locations`, the spatial locations of sites as an `sf` object, the second argument is `site_data` which is a `data.frame` containing a `"site"` column and data in -additional columns, the third and last required argument is `selected_col` which +additional columns. The third and last required argument is `selected_col` which should be the name of the column provided in the `site_data` `data.frame` that is going to be used as a variable to display. All three arguments are required. -For example, if we want to get a map of species richness of the included dataset +For example, if we want to get a map of the species richness of the included dataset, we can do the following: ```{r map-site-data-sr} @@ -469,10 +469,10 @@ site_rich <- fb_count_species_by_site(woodiv_site_species) fb_map_site_data(woodiv_locations, site_rich, "n_species") ``` -From this map, we can for example see that the French part of our dataset is +From this map, we can, for example, see that the French part of our dataset is the most species-rich compared to the other places. -Now imagine we want to display the category of sites which be "Testing" or +Now imagine we want to display the category of sites, which be "Testing" or "Training", depending on which set they belong to. We can use the following to display the map: @@ -494,7 +494,7 @@ fb_map_site_data(woodiv_locations, site_cat, "set") Mapping rasters can be quite cumbersome with R. `funbiogeo` provides a generic function to map them with `fb_map_raster()`. It only needs as first argument a `terra` `SpatRaster` object. It can be very useful when visualizing -environmental layers for example. The function will display the raster in the +environmental layers, for example. The function will display the raster in the provided projection. For example: diff --git a/vignettes/funbiogeo.Rmd b/vignettes/funbiogeo.Rmd index 8c035b2..1b89202 100644 --- a/vignettes/funbiogeo.Rmd +++ b/vignettes/funbiogeo.Rmd @@ -23,12 +23,12 @@ knitr::opts_chunk$set( The aim of the `funbiogeo` package is to help users streamline the workflows in **fun**ctional **biogeo**graphy [@Violle_emergence_2014]. -It helps filter sites, species, and traits based on their trait coverages. +It helps filter sites, species, and traits based on their trait coverage. It also provides default diagnostic plots and standard tables summarizing input data. This vignette aims to be an introduction to the most commonly used functions. -This vignette is a worked through real world example of a functional biogeography +This vignette is a worked-through real-world example of a functional biogeography workflow using the internal dataset provided in `funbiogeo` and derived from the WOODIV database [@Monnet_2021]. @@ -48,13 +48,13 @@ of the species (in rows) (`species_traits` argument in `funbiogeo` functions); the abundance, or the cover of species (in columns) across sites (in rows) (`site_species` argument in `funbiogeo` functions); - the **site x locations** object, which describes the physical location of -sites through an `sf` object (`site_locations` argument in `funbiogeo` functions). +sites through a `sf` object (`site_locations` argument in `funbiogeo` functions). -*Note: the `site_locations` object must be an `sf` object, as it is now the +*Note: the `site_locations` object must be a `sf` object, as it is now the standard package for spatial data in R. To have more information about the sf package, refer to its [website](https://r-spatial.github.io/sf/articles/sf1.html). If you want to -learn how to convert your data to an `sf` object, check the [formatting vignette](long-format.html).* +learn how to convert your data to a `sf` object, check the [formatting vignette](long-format.html).* Optionally, an additional dataset can be provided: @@ -89,7 +89,7 @@ names should be named **"species"** and the other columns should only contain trait values. Note that we'll be talking about **species** throughout this vignette -and in the arguments of `funbiogeo`, but the package doesn't make any assumption +and in the arguments of `funbiogeo`, but the package doesn't make any assumptions on the biological level. It can be individuals, populations, strains, species, genera, families, etc. The important fact is that you should have trait data for the level at which you want to work. This biological level should correspond @@ -116,13 +116,13 @@ summary(woodiv_traits) ``` Note that to use your own species by traits `data.frame`, it should follow -a similar structure with the first column being named **`"species"`** and the +a similar structure, with the first column being named **`"species"`** and the other ones containing traits. ## Site x Species -This object contains species occurrences/abundance/coverage at sites +This object contains species occurrences/abundance/coverage at the sites of the study area. It is a `data.frame`. The first column, **`"site"`**, contains site names, while the other columns contain the distribution of each species across sites. @@ -148,17 +148,17 @@ The example dataset contains the occurrence of the 24 Conifer tree species across 5,366 sites (grid cells of 10 km x 10 km resolution). To use your own site by species `data.frame`, you should follow a similar -structure with the first column being named **`"site"`** and the other ones +structure, with the first column being named **`"site"`** and the other ones containing presence information of species across sites. ## Site x Locations -This object contains the geographical location of the sites. It should be an +This object contains the geographical location of the sites. It should be a `sf` object from the [`sf` package](https://cran.r-project.org/package=sf). These are spatial R objects that describe geographical locations. The sites can have arbitrary shapes: points, regular polygons, irregular polygons, or even -line transects! To make sure that your data is well plotted you should specify +line transects! To make sure that your data is well plotted, you should specify the Coordinate Reference System (CRS) of this object. The package `funbiogeo` comes with the example dataset `woodiv_locations` @@ -214,7 +214,7 @@ this section (see the full list and how to interpret them in the *diagnostic plots* because they help us to have an overview of our dataset **prior** to the analyses. -## Trait completeness per species +## Trait Completeness per Species A first way to visualize our `data.frame` is to look at the proportion of species with non-missing traits using the `fb_plot_number_species_by_trait()` @@ -226,10 +226,10 @@ fb_plot_number_species_by_trait(woodiv_traits) This plot shows us the number of species (along the x-axis) in function of the trait name (along the y-axis). The number of concerned species is shown at the -bottom of the plot while the corresponding proportion of species (compared to +bottom of the plot, while the corresponding proportion of species (compared to all the species included in the trait dataset) is indicated as a secondary x-axis at the top. The proportion of species concerned is shown at the right of -each point. For example, in our example dataset, 70.8% species have a value for +each point. For example, in our example dataset, 70.8% of species have a value for SLA. The function also includes a way to provide a target proportion of species as @@ -237,8 +237,8 @@ the second argument. It will display the proportion as a dark red dashed line. For example, if we want to visualize which traits are available for more than 75% of the species (we can also say which traits *cover* more than 75% of the -species), we can use function `fb_plot_number_species_by_trait()`. It takes as -first argument, the species by traits table, then, as second argument the +species), we can use the function `fb_plot_number_species_by_trait()`. It takes, as +first argument, the species by traits table, then, as second argument, the proportion of species to consider (as a number between 0 and 1): ```{r fig-plot-sp-by-trait-prop} @@ -263,7 +263,7 @@ species: plant height and seed mass. Another way to filter the data would be to select certain species that have at least a certain number of traits. This can be visualized using the -`fb_plot_number_traits_by_species()` function. Similarly to the above-mentioned +`fb_plot_number_traits_by_species()` function. Similar to the above-mentioned function, it takes the species x traits `data.frame` as the first argument: ```{r fig-plot-trait-by-sp} @@ -275,41 +275,41 @@ species covered by a specific number of traits (0 to 4 in our example). We can read it as the fact that 58.3% of the species show non-missing values for the four traits in the dataset. However, all species (100%) have non-missing values for two or more traits. This doesn't mean that these two traits are the same, -but that all species of datasets have non-missing values for a combination two +but that all species of datasets have non-missing values for a combination of two traits among the four provided traits. To identify further which combinations of traits are most frequently available together, we can use the `fb_plot_trait_combination_frequencies()` function. -This function takes the species-traits table as first argument: +This function takes the species-traits table as the first argument: ```{r trait-combinations} fb_plot_trait_combination_frequencies(woodiv_traits) ``` -In this plot, each **observed** combination of missing and non-missing trait is +In this plot, each **observed** combination of missing and non-missing traits is represented as a row. Each column represents a different trait. The y-axis details how many species (as well as the corresponding proportion) harvest this particular combination of missing/non-missing trait values. The cells are blue -to represent non-missing trait, and red when they are missing. At the bottom of +to represent non-missing traits, and red when they are missing. At the bottom of the graph, we can indeed see that 14 species (representing 58.3% of the -species), have all traits that are non-missing. At the very top we see that only -one species has the a non-missing plant height and SLA with a missing seed mass +species), have all traits that are non-missing. At the very top, we see that only +one species has a non-missing plant height and SLA with a missing seed mass and wood density. This graph allows us to better examine the missingness patterns among our trait dataset. To see all the available options in -`funbiogeo` as well as the details of the arguments of each function refer to +`funbiogeo` as well as the details of the arguments of each function, refer to the [diagnostic plots vignette](diagnostic-plots.html). -# Filtering the data +# Filtering the Data We performed simple visualizations of our dataset to know identify our patterns of trait completeness. Based on these plots, we can choose thresholds in trait completeness to filter our data for our following analyses. -## Filter trait by species coverage +## Filter trait by Species Coverage We want to select the traits that are available for at least 75% of the species. -To do so we can use the `fb_filter_traits_by_species_coverage()` function. +To do so, we can use the `fb_filter_traits_by_species_coverage()` function. The function takes the species by traits `data.frame` and outputs the same dataset but with the traits filtered (so with fewer columns). The second argument `threshold_species_proportion` is the threshold proportion of species @@ -331,13 +331,13 @@ head(selected_traits) ``` The function outputs a filtered species-traits dataset retaining only traits -covering at least 75% of the species. In the end, this keep only two traits: +covering at least 75% of the species. In the end, this keeps only two traits: plant height and seed mass. -## Filter species by trait coverage +## Filter Species by trait Coverage -Now that we obtained a reduced trait dataset, selecting only two traits, this +Now that we have obtained a reduced trait dataset, selecting only two traits, this doesn't mean that these traits are available for all species. We could filter species that have only non-missing traits through the function `fb_filter_species_by_trait_coverage()` with the species x traits `data.frame` @@ -357,18 +357,18 @@ head(selected_species) ``` -In the end, by filtering the traits available for at least 75% of the species, +In the end, by filtering the traits available for at least 75% of the species have filtering the species that have only non-missing traits for these traits, -we ended with a list of 23 species and 2 traits. This is the dataset we'll +we ended up with a list of 23 species and 2 traits. This is the dataset we'll continue using in the rest of the vignette. -## Filter sites by trait coverage +## Filter Sites by trait Coverage Now that we have filtered our traits and species of interest, we need to filter the sites that contain enough species for which the traits are available. -Similarly to above the function is `fb_filter_sites_by_trait_coverage()` it -takes as first two arguments the site x species `data.frame` and the species x +Similar to above, the function is `fb_filter_sites_by_trait_coverage()` it +takes as the first two arguments the site x species `data.frame` and the species x traits `data.frame`. The third argument is `threshold_traits_proportion` that indicates the percent coverage of traits to filter each site. Note that this coverage is weighted by the occurrence, abundance, or cover depending on the @@ -393,29 +393,29 @@ filt_sites[1:4, 1:4] ``` The output of the function is a site x species `data.frame` with selected sites -and species. Now we selected 5,364 sites out of 5,366, for our 2 traits and 23 +and species. Now we have selected 5,364 sites out of 5,366 for our 2 traits and 23 species. -# Computing Functional Diversity metrics +# Computing Functional Diversity Metrics The `funbiogeo` functions helped us filter our data appropriately with enough available trait information for species and sites. The goal of `funbiogeo` is to -help you analyzing functional trait data and computing functional diversity +help you analyze functional trait data and computing functional diversity indices. These indices capture the diversity of trait values in a set of species, if you're interested in an introduction to functional diversity indices and how -to analyze them, it's out of scope of this vignette, but you can refer to +to analyze them, it's out of the scope of this vignette, but you can refer to Mammola et al. [-@Mammola_Concepts_2021]. The paper provides a general workflow to work with trait data. -`funbiogeo` doesn't aim to substitute to all these amazing tools that compute a -diversity of indices with different properties and formulas., however, we can -use the filtered datasets to proceed with our analyses This is where you should +`funbiogeo` doesn't aim to substitute for all these amazing tools that compute a +diversity of indices with different properties and formulas. However, we can +use the filtered datasets to proceed with our analyses. This is where you should use your preferred packages to compute functional diversity indices like [`mFD`](https://cran.r-project.org/package=mFD) or [`funrar`](https://cran.r-project.org/package=funrar). If you're new to the world of functional diversity analyses, we suggest reading -[`mFD` introductory vignette](https://cmlmagneville.github.io/mFD/articles/mFD_general_workflow.html). It underlines the different steps to compute functional diversity metrics. If you're interested in computing functional rarity indices with `funrar` you canalso refer to [its tutorial](https://rekyt.github.io/funrar/articles/funrar.html). +[`mFD` introductory vignette](https://cmlmagneville.github.io/mFD/articles/mFD_general_workflow.html). It underlines the different steps to compute functional diversity metrics. If you're interested in computing functional rarity indices with `funrar` you can also refer to [its tutorial](https://rekyt.github.io/funrar/articles/funrar.html). For the sake of the example, we included a function in `funbiogeo` to compute Community-Weighted Mean [CWM, @Garnier_Plant_2004] named `fb_cwm()`. @@ -426,7 +426,7 @@ diversity indices using the `mFD` package. ## Community-Weighted Mean (CWM) -We're interested to look at the spatial distribution of the average plant height +We're interested in looking at the spatial distribution of the average plant height and seed mass of Conifer tree species. To do so, we can compute the community-weighted mean of both traits. We'll use the `fb_cwm()` function to do so, it takes the site x species `data.frame` and species x traits `data.frame` @@ -448,7 +448,7 @@ value of the CWM at this site for this trait. ## Compute functional diversity indices -We can also integrate our filtered datasets in other functional diversity computation pipeline. We'll show an example by computing functional richness +We can also integrate our filtered datasets into other functional diversity computation pipelines. We'll show an example of computing functional richness with the `mFD` package. Before computing any functional diversity index, we need to scale and center our @@ -457,7 +457,7 @@ trait values so that they are on comparable scales. For this we can use the are available as row names instead of having a separate `species` column. We first give row names and then use the function to scale our traits. -*Note: the following chunk of code is executed only if you have `mFD` installed* +*Note: the following chunk of code is executed only if you have `mFD` installed.* ```{r mfd-scaling-traits, eval = require("mFD")} ## To install 'mFD' uncomment the following line @@ -486,7 +486,7 @@ head(scaled_traits) We get a data.frame with the two scaled traits and species names as row names. To compute functional diversity indices with `mFD` we further need a site by -species data.frame, with sites names as row names, and only for the species for +species data.frame, with site names as row names, and only for the species for which we have the traits in the species by trait table. We thus transform the site by species object: @@ -504,7 +504,7 @@ formatted_site_species <- formatted_site_species[, rownames(scaled_traits)] formatted_site_species[1:5, 1:5] ``` -`mFD` furthermore requires that the site by species object is a `matrix`, we +`mFD`, furthermore, requires that the site by species object is a `matrix`. We thus convert it to a matrix: ```{r mfd-site-species-matrix, eval = require("mFD")} @@ -512,17 +512,17 @@ formatted_site_species <- as.matrix(formatted_site_species) ``` We can now compute functional diversity metrics using the `alpha.fd.multidim()` -function from the `mFD` package. We'll be using our two formatted objects +function from the `mFD` package. We'll be using our two formatted objects: `scaled_traits` as our species by trait table, and `formatted_site_species` as our site by species table. We'll be computing Functional Dispersion (noted FDis) as a functional diversity index. It's out of the scope of this vignette to explain the differences between functional diversity indices, but we recommend reading Mammola et al. [-@Mammola_Concepts_2021] for a general introduction -about them. +to them. Going back to computing Functional Dispersion with the `alpha.fd.multidim()` -function, we need to use the species-traits table as first argument then the -site-species table as second argument, then the name of the functional diversity +function, we need to use the species-traits table as the first argument then the +site-species table as the second argument, then the name of the functional diversity index as a third argument: ```{r mfd-fdis, eval = require("mFD")} @@ -536,7 +536,7 @@ head(woodiv_fdis$functional_diversity_indices) ``` We now have a table with several diversity indices computed for each site. This -table contains site names as row names an four columns: +table contains site names as row names four columns: - `sp_richn`, which is the species richness, - `fdis`, the Functional Dispersion index, @@ -553,7 +553,7 @@ section of this introductory vignette. ## Mapping diversity indices As we're interested in putting functional dispersion on a map, we should -slightly transform our site-indices table to include an explicit `site` column +slightly transform our site-index table to include an explicit `site` column as required by `funbiogeo`: ```{r site-indices, eval = require("mFD")} @@ -585,9 +585,9 @@ fb_map_site_data(woodiv_locations, woodiv_fdis, "fdis") ``` We get a map of our sites, colored by Functional Dispersion. The map uses the default `ggplot2` color scheme. We see that we have higher functional dispersion -in Spain and in the French-Spanish border than in Italy. -Because all of the plotting functions of `funbiogeo` output `ggplot2` objects -they can be customized using usual `ggplot2` syntax: +in Spain and on the French-Spanish border than in Italy. +Because all of the plotting functions of `funbiogeo` output `ggplot2` objects, +they can be customized using the usual `ggplot2` syntax: ```{r map-fdis-custom, eval = require("mFD")} fb_map_site_data(woodiv_locations, woodiv_fdis, "fdis") + @@ -609,33 +609,33 @@ fb_map_site_data(woodiv_locations, woodiv_fdis, "fide_plant_height") + theme(legend.position = "bottom") ``` -We see that the tallest assemblages are in Italy, while central Spain and North -of Portugal shows the lowest average plant height. You can adapt the above code -on the other available diversity indices. +We see that the tallest assemblages are in Italy, while central Spain and the north +of Portugal show the lowest average plant heights. You can adapt the above code +to the other available diversity indices. -## Mapping an environmental raster +## Mapping an Environmental Raster A common case of analysis in biogeography is to be interested in mapping environmental variables. If we want to display the environment associated with -our sites of interest, we can leverage environmental raster layers as provided, +our sites of interest, we can leverage environmental raster layers, as provided, for example, by WorldClim [@Fick_WorldClim_2017] or CHELSA [@Karger_Climatologies_2017]. Fortunately, we have access to an example raster of mean annual temperature through `funbiogeo`. -We ar first going to read the raster using the `terra` package, which is the +We are first going to read the raster using the `terra` package, which is the reference package to read spatial raster data. If you want to know more about raster data, we recommend reading the [dedicated chapter](https://r.geocompx.org/attr#manipulating-raster-objects) in the -[*Geocomputations with R*](https://r.geocompx.org/) book +[*Geocomputations With R*](https://r.geocompx.org/) book [@Lovelace_Geocomputation_2025]. Then, we'll use the `fb_map_raster()` function that displays a map for the first layer of the raster data. -It takes the actual raster object as first argument. The other arguments are +It takes the actual raster object as the first argument. The other arguments are passed to the `theme()` function of ggplot2 to customize the plot. So first, let's read the mean annual temperature raster provided by `funbiogeo`. -For this, we'll use the `system.file()` function which allows accessing specific -files from packages, then we'll read the raster with the `rast()` function from +For this, we'll use the `system.file()` function, which allows accessing specific +files from packages; then we'll read the raster with the `rast()` function from the `terra` package: ```{r map-tavg-load-raster} @@ -651,7 +651,7 @@ tavg <- terra::rast(tavg) tavg ``` -This raster represents mean annual temperature in Europe. We can see that the +This raster represents the mean annual temperature in Europe. We can see that the temperature goes between -5.9°C and 19.8°C. The raster isn't projected (as given by its [EPSG](https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset) code). @@ -665,7 +665,7 @@ fb_map_raster(tavg) ``` As with all plots provided by `funbiogeo`, the `fb_map_raster()` returns a -ggplot2 object which can be customized providing additional functions: +ggplot2 object, which can be customized providing additional functions: ```{r map-tavg-custom, dev = 'png'} # Map raster @@ -675,12 +675,12 @@ fb_map_raster(tavg) + theme(legend.position = "bottom") ``` -To learn about the ggplot2 syntax and functions check the +To learn about the ggplot2 syntax and functions, check the [`ggplot2` introductory vignette](https://ggplot2.tidyverse.org/articles/ggplot2.html). -Leveraging the `patchwork` package, that allows to combine different `ggplot2` +Leveraging the `patchwork` package, which allows to combine different `ggplot2` objects, we can have side-by-side, the map of mean annual temperature and the -one on annual total precipitation: +one on total annual precipitation: ```{r composition-map, dev = 'png', eval = require("patchwork")} library("patchwork") @@ -707,19 +707,19 @@ map_precipitation <- fb_map_raster(prec) + theme(text = element_text(family = "mono")) ``` -This function allows the visualization of a raster in a simple fashion, but it -doesn't tell us anything about the environmental variable at the sites we're -interested in. In the next section we will map an environmental variable at each +This function allows for the visualization of a raster in a simple fashion, but it +doesn't tell us anything about the environmental variables at the sites we're +interested in. In the next section, we will map an environmental variable at each site of our study. -## Map of average environmental variable in site +## Map of Average Environmental Variables in Site We want to make a map of the average environmental conditions of the sites. For this we're using the above-mentioned `terra` raster of the mean annual temperature, named `tavg`. To automatically extract the average mean annual -temperature per site, we use the `fb_get_environment()` function, it takes as -first argumen the site-locations object and a second argument the environmental +temperature per site, we use the `fb_get_environment()` function. It takes as +first argument the site-locations object and a second argument the environmental raster. It takes the average of the raster values per site. ```{r get-environment} @@ -728,13 +728,13 @@ site_env <- fb_get_environment(woodiv_locations, tavg) head(site_env) ``` -The variable names in the columns are based on the names of the provided raster. +The names of the variables in the columns come from the names of the rasters provided. Here, because the raster has a single layer named `annual_mean_temp`, it's the name of the column. Note that the `fb_get_environment()` function also works -with multi-layered rasters to extract multiple average conditions at observed +with multilayered rasters to extract multiple average conditions at observed sites. -To put these values on the map we can use the `fb_map_site_data()` function, +To put these values on the map, we can use the `fb_map_site_data()` function, which allows mapping arbitrary site-level variables. It takes three needed arguments: the first of which, `site_locations`, which is the `sf` object describing sites' geographic locations; the second argument is `site_data`, @@ -756,11 +756,11 @@ fb_map_site_data(woodiv_locations, site_env, "annual_mean_temp") + This concludes our tutorial to introduce `funbiogeo`. The package contains many more features, especially several diagnostic plots that allow to identify which -trait are missing or not, at species or site scales. All of the plots are -explained in details in [a dedicated vignette](diagnostic-plots.Rmd). There is +traits are missing or not, at species or site scales. All of the plots are +explained in detail in [a dedicated vignette](diagnostic-plots.Rmd). There is also a specific vignette about -[transforming raw data from long to wide format](long-format.Rmd). -Finally, if you're interested in learning about up-scaling your sites, +[transforming raw data from a long to wide format](long-format.Rmd). +Finally, if you're interested in learning about upscaling your sites, which means aggregating your sites at coarser scales,you can refer to [the specific vignette](upscaling.Rmd). diff --git a/vignettes/long-format.Rmd b/vignettes/long-format.Rmd index 6ffe814..d13bbee 100644 --- a/vignettes/long-format.Rmd +++ b/vignettes/long-format.Rmd @@ -15,18 +15,18 @@ knitr::opts_chunk$set( ) ``` -This vignette will explain how to process a dataset that is aggregated in long +This vignette will explain how to process a dataset that is aggregated in a long format to work with `funbiogeo` Most functions in `funbiogeo` need three different datasets to work: -- the **species x traits** `data.frame` (example dataset:`woodiv_traits` in +- The **species x traits** `data.frame` (example dataset:`woodiv_traits` in `funbiogeo`), which contains trait values for several traits (in columns) for several species (in rows). -- the **site x species** `data.frame` (example dataset:`woodiv_site_species` in +- The **site x species** `data.frame` (example dataset:`woodiv_site_species` in `funbiogeo`), which contains the presence/absence, abundance, or cover information for species (in columns) by sites (in rows). -- the **site x locations** `sf` object (example dataset:`woodiv_locations` in +- The **site x locations** `sf` object (example dataset:`woodiv_locations` in `funbiogeo`), which contains the physical locations of the sites of interest. Optionally, an additional dataset can be provided: @@ -40,7 +40,7 @@ library(funbiogeo) ``` -## Wide vs long format +## Wide vs. long format In `funbiogeo` these datasets **must be** in a wide format (where one row hosts several variables across columns), but sometimes information is structured in a @@ -105,7 +105,7 @@ dataset. All these functions take a long dataset as input (argument `data`), where one row corresponds to the occurrence/abundance/coverage of one species at one site -and output a wider object. +and outputs a wider object. ## Usage @@ -158,11 +158,11 @@ head(species_traits, 10) -### Extracting site x species data +### Extracting Site x Species Data The function `fb_format_site_species()` extracts species occurrence/abundance/coverage from this long table to create the -site x species dataset. Note that one species must have been observed one time +site x species dataset. Note that one species must have been observed once at one site (the package `funbiogeo` does not yet consider temporal resurveys). ```{r 'format-sites-species'} @@ -181,9 +181,9 @@ head(site_species[ , 1:8], 10) -### Extracting site x locations data +### Extracting Site x locations Data -The function `fb_format_site_locations()` extracts sites coordinates from this +The function `fb_format_site_locations()` extracts site coordinates from this long table to create the site x locations dataset. Note that one site must have one unique longitude x latitude value. @@ -203,7 +203,7 @@ head(site_locations) ``` -### Extracting species x categories data +### Extracting Species x categories Data The function `fb_format_species_categories()` extracts species values for one supra-category (optional) from this long table to create the species x diff --git a/vignettes/special_cases.Rmd b/vignettes/special_cases.Rmd index ba420a0..acb78f4 100644 --- a/vignettes/special_cases.Rmd +++ b/vignettes/special_cases.Rmd @@ -21,11 +21,11 @@ library(funbiogeo) This vignette aims to describe several specific cases in the use of `funbiogeo`. It provides detailed examples of these uses. If you find your case is missing or if you have additional questions, please [open an issue](https://github.com/FRBCesab/funbiogeo/issues/new/choose). -## Working with Categorical Traits +## Working With Categorical Traits Traits are not always continuous. While `funbiogeo` has been thought mainly to work with continuous trait data, it can also work with categorical trait data. This section describes how to use `funbiogeo` to work with categorical traits. -The default dataset provided in `funbiogeo` is an extract of the [WOODIV database](https://doi.org/10.1038/s41597-021-00873-3), describing the diversity of Mediterrannean trees. It contains data for 28 species. To focus on categorical traits, we here propose to add three more traits for each species: its leaf habit (whether is deciduous or not?), its seed dispersal mode, and its shade tolerance. The next chunk gives these traits for the 24 species. We coded seed dispersal as a categorical trait with two modalities `"anemochory"` and `"endozoochory"`. We coded shade tolerance as a categorical traits with five ordered levels `"very_intolerant"`, `"intolerant"`, `"moderately_tolerant"`, `"tolerant"`, and `"very_tolerant"`. We first give the complete dataset, and then randomly remove data points to show the abilities of `funbiogeo` to display missing categorical traits. +The default dataset provided in `funbiogeo` is an extract from the [WOODIV database](https://doi.org/10.1038/s41597-021-00873-3), describing the diversity of Mediterrannean trees. It contains data for 28 species. To focus on categorical traits, we here propose to add three more traits for each species: its leaf habit (whether it is deciduous or not?), its seed dispersal mode, and its shade tolerance. The next chunk gives these traits for the 24 species. We coded seed dispersal as a categorical trait with two modalities: `"anemochory"` and `"endozoochory"`. We coded shade tolerance as a categorical traits with five ordered levels: `"very_intolerant"`, `"intolerant"`, `"moderately_tolerant"`, `"tolerant"`, and `"very_tolerant"`. We first give the complete dataset, and then randomly remove data points to show the abilities of `funbiogeo` to display missing categorical traits. ```{r woodiv_cat} woodiv_cat <- data.frame( @@ -56,7 +56,7 @@ woodiv_cat <- data.frame( head(woodiv_cat) ``` -Then to simulate missing trait data, we randomly remove 20% of the values: +Then, to simulate missing trait data, we randomly remove 20% of the values: ```{r woodiv_cat_na} # Randomly removes 20% of the values @@ -101,9 +101,9 @@ fb_map_site_traits_completeness( ## Considering Intraspecific Variation -Trait-based ecology tends to present its frameworks and analyses with species average traits, most of its concepts can, however, apply to intraspecific trait variation, `funbiogeo` is no different. All of the examples, including the dataset provided with the package, show species average traits. In this section, we detail how to work with data that include intraspecific variation within `funbiogeo`. This should be fairly similar to what's possible across other functional diversity R packages. +Trait-based ecology tends to present its frameworks and analyses with species average traits. Most of its concepts can, however, apply to intraspecific trait variation; `funbiogeo` is no different. All of the examples, including the dataset provided with the package, show species average traits. In this section, we detail how to work with data that include intraspecific variation within `funbiogeo`. This should be fairly similar to what's possible across other functional diversity R packages. -To include intraspecific variation, the user has to index species within specific sites. For example, if they are three individuals of *Abies alba* in site *A*, then the user has to provide different names to the different individuals like `Abies_alba_1`, `Abies_alba_2`, and `Abies_alba_3`. These names have to be reused consistently across objects `site_species`, `species_traits`, and `species_categories`. As such, the user can define as fine as possible intraspecific variation. It is also possible to provide individual trait value for one or several sites and species average trait for the rest of the sites, following the same idea as long as the naming of species and invidivuals is consistent across objects. In this case, the specified individuals will be counfounded as distinct species in trait completeness plots. +To include intraspecific variation, the user has to index species within specific sites. For example, if they are three individuals of *Abies alba* in site *A*, then the user has to provide different names to the different individuals, like `Abies_alba_1`, `Abies_alba_2`, and `Abies_alba_3`. These names have to be reused consistently across objects `site_species`, `species_traits`, and `species_categories`. As such, the user can define as fine as possible intraspecific variation. It is also possible to provide individual trait values for one or several sites and species average traits for the rest of the sites, following the same idea as long as the naming of species and invidivuals is consistent across objects. In this case, the specified individuals will be confounded as distinct species in trait completeness plots. ## Sites of Arbitrary Shapes @@ -120,7 +120,7 @@ fb_map_site_traits_completeness( ) ``` -We will now convert the sites to points by taking the centroid of sites and use `fb_map_*()` functions to see how it will affect their outputs: +We will now convert the sites to points by taking the centroid of sites and using `fb_map_*()` functions to see how it will affect their outputs: ```{r convert_to_points} # Convert all the sites into 'POINT' geometry @@ -136,7 +136,7 @@ fb_map_site_traits_completeness( As seen above, the sites are now actual points instead of the original squares. The function will adapt to the geometry of the sites provided by the user. -But `funbiogeo` can accommodate sites of any geometry, to show sites that represent lines, we will group sites into lines of sites and use the same function. +However, `funbiogeo` can handle sites of any shape. To display line-shaped sites, we will cluster them into groups of site lines and use the same function. ```{r convert_to_lines} lines_sites <- points_sites @@ -175,7 +175,7 @@ fb_map_site_traits_completeness( The geometry now displays the lines, even though they are not the most perfect representation of the actual sites, but it shows the capabilities of funbiogeo. -Similarly to the [upscaling vignette](vignettes/upscaling.Rmd), the map functions can also accommodate larger polygons, for example by aggregating sites per country. +Similar to the [upscaling vignette](vignettes/upscaling.Rmd), the map functions can also accommodate larger polygons, for example by aggregating sites per country. ```{r convert_to_polygons} # Convert all sites to a single polygon diff --git a/vignettes/upscaling.Rmd b/vignettes/upscaling.Rmd index e7fdb2b..09ff27f 100644 --- a/vignettes/upscaling.Rmd +++ b/vignettes/upscaling.Rmd @@ -15,12 +15,12 @@ knitr::opts_chunk$set( ``` -`funbiogeo` provides an easy way to upscale your site data to a coarser +`funbiogeo` provides an easy way to upscale your site’s data to a coarser resolution. The idea is that you have any type of data at the site level (diversity metrics, environmental data, as well as site-species data) that you -would like to work on or visualize at a coarser scale. The aggregation process +would like to work on or visualize on a coarser scale. The aggregation process can look daunting at first and be quite difficult to run. -We explain in details, throughout this vignette, how to do so with the +We explain in detail, throughout this vignette, how to do so with the `fb_aggregate_site_data()` function. We'll detail three use cases: 1. Aggregating through a regular square grid @@ -52,7 +52,7 @@ woodiv_locations ``` These sites are a collection of regular spatial polygons at a resolution of -10 km x 10 km over South Western Europe. Our site by locations object is an `sf` +10 km x 10 km over South Western Europe. Our site by locations object is a `sf` object, which means it's a spatial object coupled with a data.frame. For each site, we want to compute the species richness. We can do so by counting @@ -71,7 +71,7 @@ head(species_richness) Unfortunately, the `fb_count_species_by_site()` doesn't output a spatial object but a data.frame with three columns: the identifier of the site, the number of species, and the proportion of species present at a given site. Before going any -further let's put species richness on the map with the function +further, let's put species richness on the map with the function `fb_map_site_data()`. That function allows us to represent any arbitrary data at the site level by combining it with the site by locations object, and it shows a map of that data: @@ -81,13 +81,13 @@ shows a map of that data: fb_map_site_data(woodiv_locations, species_richness, "n_species") ``` -Now, from this site-level species richness, we would like to get the species +Now, from this site-level species richness, we would like to get species richness at different scales. First, we'll show how to aggregate on a coarser square grid, then at the country level, and then with a grid from a `SpatRaster`. -# Aggregating through a square grid +# Aggregating Through a Square Grid Our initial sites are 10km by 10km, we want to aggregate using a grid with pixels of 300km by 300km. When aggregating with a such a grid, there are two @@ -97,8 +97,8 @@ function `st_make_grid()` from the `sf` package. Because our data are using a [map projection](https://en.wikipedia.org/wiki/Map_projection) with units in meters, we can directly specify the `cellsize` argument of `st_make_grid()`. -The function the locations object as first argument, then a vector of two -nubmers as the second argument defining the cell sizes. By default, the newly +The function the locations object as the first argument, then a vector of two +numbers as the second argument defining the cell sizes. By default, the newly created grid will cover the same extent as in the input spatial data. ```{r make-grid} @@ -107,7 +107,7 @@ coarser_grid <- sf::st_make_grid(woodiv_locations, cellsize = c(300e3, 300e3)) head(coarser_grid) ``` -Because the coarser grid is an `sfc` object, we need to transform it to an `sf` +Because the coarser grid is a `sfc` object, we need to transform it into a `sf` object to be fully compatible with `fb_aggregate_site_data()`. We do so in the next chunk: @@ -136,16 +136,16 @@ In orange, you see the original data grid of the data, while in purple is the coarser grid we defined over which we'll aggregate the data. We're interested in knowing the occurrence of our Mediterranean plant species -over each pixel of the coarser grid. To do this we'll use the +over each pixel of the coarser grid. To do this, we'll use the `fb_aggregate_site_data()` function from `funbiogeo`. It takes as first argument the site-locations object (including the column with the site ids), then the data that needs to be aggregated (with the site ids column), then the grid over -which to aggregate the data, finally the last argument is the function to use +which to aggregate the data, finally, the last argument is the function to use to aggregate the data. By default, uses the `mean()` function and assumes quantitative site data only. Because we're interested in species richness, we'll use the dataset we -computed in the first section, that defines species richness at each site. We'll +computed in the first section, which defines species richness at each site. We'll use the `fb_aggregate_site_data()` function on our original site-locations object, with the species richness data, on the coarser grid, using the `mean()` function to get back the average species richness across sites per larger pixel. @@ -158,7 +158,7 @@ coarser_richness <- fb_aggregate_site_data( head(coarser_richness) ``` -Note that the object we obtain is also an `sf` object of the same type as the +Note that the object we obtain is also a `sf` object of the same type as the `grid` we provided. We can plot it using `ggplot2`: @@ -184,7 +184,7 @@ country scale. Initially, our data may be on a regular grid, but we want to aggregate at the country scale. This is easily doable with the `fb_aggregate_site_data()` function. -The function actually works with any arbitrary `sf` object as aggregation grid +The function actually works with any arbitrary `sf` object as an aggregation grid whether it's polygons regular or not, lines, points, or a mix of any geometries. To show you how to do it, we'll use the included spatial object for the four @@ -204,7 +204,7 @@ ggplot(countries) + theme_bw() ``` -As we already have our aggregation object, here the countries delimitation, +As we already have our aggregation object, here the countries' delimitation, we can directly use the `fb_aggregate_site_data()` function on: the original locations, the species richness data, and the countries object, with the `mean()` function. As such, we'll obtain the average species richness in @@ -224,13 +224,13 @@ ggplot(countries_richness) + theme_bw() ``` -We observe that France has a higher species richness than the other countries, +We observe that France has a higher species richness than other countries, with more than 5 species on average in the sites it harbors. # Aggregating through a SpatRaster grid -In the two previous sections we saw how to aggregate on arbitrary shaped +In the two previous sections, we saw how to aggregate on arbitrary shaped polygons. However, another quite common use case, especially when working with species distribution model, is to want to aggregate data follow an environmental raster or a raster grid. @@ -240,11 +240,11 @@ for our environmental data such as mean annual temperature. We need to aggregate site data on a `SpatRaster` spatial grid (from the [`terra`](https://cran.r-project.org/package=terra) package). A nice property of `fb_aggregate_site_data()` function is that it outputs -a matching type of object as the provided grid. If it's an `sf` object than the +a matching type of object as the provided grid. If it's a `sf` object, then the function will output the same type of `sf` object, while if it's a `SpatRaster` then it will give back a `SpatRaster` object. -First of all, we can create a coarser grid based on our locations object. +First of all, we can create a coarser grid based on our location object. ```{r import-grid} @@ -260,15 +260,15 @@ coarser raster (resolution of 0.83°, close to 92 km by 92km) with the function function requires the following arguments: - `site_locations`: the site x locations object -- `site_data`: a `matrix` or `data.frame` containing values per sites to +- `site_data`: a `matrix` or `data.frame` containing values per site to aggregate on the provided grid `agg_geom`. Can have one or several columns -(variables to aggregate). The first column must contain sites names as provided +(variables to aggregate). The first column must contain site names as provided in the object `species_richness` - `agg_geom`: a `SpatRaster` object (package `terra`). A raster of one single -layer, that defines the grid along which to aggregate -- `fun`: the function used to aggregate sites values when there are multiple +layer that defines the grid along which to aggregate. +- `fun`: the function used to aggregate site values when there are multiple sites in one cell (do we want to get the minimum value? the maximum? the sum? -or the mean?) +Or the mean?) Let's compute our average species richness values across our grid. @@ -285,7 +285,7 @@ upscaled_richness <- fb_aggregate_site_data( upscaled_richness ``` -We get a `SpatRaster` object that is of the same resolution of our provided +We get a `SpatRaster` object that is of the same resolution as our provided `agg_geom` raster grid. The cells of this raster contain the averaged values of species richness of our sites aggregated on the coarser grid. We can plot these values through a call to `fb_map_raster()` which allows to plot rasters. @@ -299,7 +299,7 @@ fb_map_raster(upscaled_richness) ## Coarsening site-species data -Through the `fb_aggregate_site_data()` function we can also coarsen our +Through the `fb_aggregate_site_data()` function, we can also coarsen our site-species grid by selecting the appropriate function as the `fun` argument, we detail how in this section. @@ -314,10 +314,10 @@ defining the coarser grid. We'll use the previously defined object to run our example. To aggregate the presence-absence of species within each pixel of the new grid, we'll use -the `max()` function (as the `fun` argument). As such, coarser pixels which -contains a mix of presence and absence of certain species, we'll be considered +the `max()` function (as the `fun` argument). As such, coarser pixels, which +contain a mix of presence and absence of certain species, we'll be considered as having the species present. Only when the species is absent from all of the -finer scale sites will the coarser pixel show the species as absent. +finer scale sites, will the coarser pixels show the species as absent. ```{r upscale-site-species} site_species_agg <- fb_aggregate_site_data( @@ -328,9 +328,9 @@ site_species_agg <- fb_aggregate_site_data( ) ``` -The return object is a `SpatRaster` as well but can be transformed easily in a +The returned object is a `SpatRaster` as well but can be easily transformed into a data.frame to follow back with the regular analyses provided in `funbiogeo`. -The new object contains one layer for each aggregated variable, i.e. here, one +The new object contains one layer for each aggregated variable, i.e., here, one per species. ```{r show-upscale-site-species} @@ -360,7 +360,7 @@ patchwork::wrap_plots(finer_map, coarser_map, nrow = 1) ### Obtaining back a site x species `data.frame` -Now we obtained a raster of aggregated site-species presences. However, the +Now we have obtained a raster of aggregated site-species presences. However, the other functions of `funbiogeo` don't play well with raster data. They need data.frames to work well. We can do this through the specific function `as.data.frame()` in `terra` (make sure to check the dedicated help page that @@ -381,12 +381,12 @@ coarser data. You can proceed similarly to aggregate the ancillary site-related data, to use them in the rest of the analyses. -## Upscaling functional diversity data +## Upscaling Functional Diversity Data Because `funbiogeo` focuses on the functional biogeography workflow, we'll explore in this section how to aggregate the results for a functional biogeography function. First, we'll detail an example aggregating the -community-weighted mean (CWM) of plant height, that is the abundance-weighted +community-weighted mean (CWM) of plant height, which is the abundance-weighted trait average of the assemblage. Second, we'll show an example of coarsing functional diversity metrics computed through the [`fundiversity`](https://funecology.github.io/fundiversity/) package. @@ -401,9 +401,9 @@ site_cwm <- fb_cwm(woodiv_site_species, woodiv_traits[, 1:2]) head(site_cwm) ``` -Now we can aggregate the CWM of plant hieght at coarser scale using +Now we can aggregate the CWM of plant height at a coarser scale using `fb_aggregate_site_data()` as done in the previous sections, this time using -the default `fun` argument as we want to compute the average CWM: +the default `fun` argument, as we want to compute the average CWM: ```{r upscaled-cwm} colnames(site_cwm)[3] <- "plant_height" @@ -428,7 +428,7 @@ fb_map_raster(upscaled_cwm) + ### Coarser FRic through `fundiversity` In a similar fashion as in the -[introduction vignette to `funbiogeo`](funbiogeo.Rmd) in this section we'll +[introduction vignette to `funbiogeo`](funbiogeo.Rmd) in this section, we'll compute the Functional Richness using two traits across our example dataset. ```{r compute-fric, eval = require("fundiversity")} @@ -472,7 +472,7 @@ head(site_fric) ``` We can now follow a similar upscaling process as in the previous sections to -compute the average functional richness at a coarser spatial scale: +compute the average functional richness on a coarser spatial scale: ```{r upscale-fric, eval = require("fundiversity")} agg_fric <- fb_aggregate_site_data(