All species are hierarchically related to one another, and we use taxonomic names to label the nodes in this hierarchy. Taxonomic data is becoming increasingly available on the web, but scientists need a way to access it in a programmatic fashion that’s easy and reproducible. We have developed taxize, an open-source software package (freely available from
http://cran.r-project.org/web/packages/taxize/index.html) for the R language. taxize provides simple, programmatic access to taxonomic data for 13 data sources around the web. We discuss the need for a taxonomic toolbelt in R, and outline a suite of use cases for which taxize is ideally suited (including a full workflow as an appendix). The taxize package facilitates open and reproducible science by allowing taxonomic data collection to be done in the open-source R platform.
All species are hierarchically related to one another, and we use taxonomic names to label the nodes in this hierarchy. Taxonomic data is becoming increasingly available on the web, but scientists need a way to access it in a programmatic fashion that’s easy and reproducible. We have developed taxize, an open-source software package (freely available from
http://cran.r-project.org/web/packages/taxize/index.html) for the R language. taxize provides simple, programmatic access to taxonomic data for 13 data sources around the web. We discuss the need for a taxonomic toolbelt in R, and outline a suite of use cases for which taxize is ideally suited (including a full workflow as an appendix). The taxize package facilitates open and reproducible science by allowing taxonomic data collection to be done in the open-source R platform.
Small streams are important refuges for biodiversity. In agricultural areas, they may be at risk from pesticide pollution. However, most related studies have been limited to a few streams on the regional level, hampering extrapolation to larger scales. We quantified risks as exceedances of regulatory acceptable concentrations (RACs) and used German monitoring data to quantify the drivers thereof and to assess current risks in small streams on a large scale. The data set was comprised of 1 766 104 measurements of 478 pesticides (including metabolites) related to 24 743 samples from 2301 sampling sites. We investigated the influence of agricultural land use, catchment size, as well as precipitation and seasonal dynamics on pesticide risk taking also concentrations below the limit of quantification into account. The exceedances of risk thresholds dropped 3.7-fold at sites with no agriculture. Precipitation increased detection probability by 43%, and concentrations were the highest from April to June. Overall, this indicates that agricultural land use is a major contributor of pesticides in streams. RACs were exceeded in 26% of streams, with the highest exceedances found for neonicotinoid insecticides. We conclude that pesticides from agricultural land use are a major threat to small streams and their biodiversity. To reflect peak concentrations, current pesticide monitoring needs refinement.
A wide range of chemical information is freely available online, including identifiers, experimental and predicted chemical properties. However, these data are scattered over various data sources and not easily accessible to researchers. Manual searching and downloading of such data is time-consuming and error-prone. We developed the open-source R package webchem that allows users to automatically query chemical data from currently 14 web sources. These cover a broad spectrum of information. The data are automatically imported into an R object and can directly be used in subsequent analyses. webchem enables easy, structured and reproducible data retrieval and usage from publicly available web sources. In addition, it facilitates data cleaning, identification and reporting of substances. Consequently, it reduces the time researchers need to spend on chemical data compilation.
Ecotoxicologists often encounter count and proportion data that are rarely normally distributed. To meet the assumptions of the linear model, such data are usually transformed or non-parametric methods are used if the transformed data still violate the assumptions. Generalized linear models (GLMs) allow to directly model such data, without the need for transformation. Here, we compare the performance of two parametric methods, i.e., (1) the linear model (assuming normality of transformed data), (2) GLMs (assuming a Poisson, negative binomial, or binomially distributed response), and (3) non-parametric methods. We simulated typical data mimicking low replicated ecotoxicological experiments of two common data types (counts and proportions from counts). We compared the performance of the different methods in terms of statistical power and Type I error for detecting a general treatment effect and determining the lowest observed effect concentration (LOEC). In addition, we outlined differences on a real-world mesocosm data set. For count data, we found that the quasi-Poisson model yielded the highest power. The negative binomial GLM resulted in increased Type I errors, which could be fixed using the parametric bootstrap. For proportions, binomial GLMs performed better than the linear model, except to determine LOEC at extremely low sample sizes. The compared non-parametric methods had generally lower power. We recommend that counts in one-factorial experiments should be analyzed using quasi-Poisson models and proportions from counts by binomial GLMs. These methods should become standard in ecotoxicology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.