Relative to target species, priority conservation species occur rarely in fishery interactions, resulting in imbalanced, overdispersed data. We present Ensemble Random Forests (ERFs) as an intuitive extension of the Random Forest algorithm to handle rare event bias. Each Random Forest receives individual stratified randomly sampled training/test sets, then down-samples the majority class for each decision tree. Results are averaged across Random Forests to generate an ensemble prediction. Through simulation, we show that ERFs outperform Random Forest with and without down-sampling, as well as with the synthetic minority over-sampling technique, for highly class imbalanced to balanced datasets. Spatial covariance greatly impacts ERFs’ perceived performance, as shown through simulation and case studies. In case studies from the Hawaii deep-set longline fishery, giant manta ray Mobula birostris syn. Manta birostris and scalloped hammerhead Sphyrna lewini presence had high spatial covariance and high model test performance, while false killer whale Pseudorca crassidens had low spatial covariance and low model test performance. Overall, we find ERFs have 4 advantages: (1) reduced successive partitioning effects; (2) prediction uncertainty propagation; (3) better accounting for interacting covariates through balancing; and (4) minimization of false positives, as the majority of Random Forests within the ensemble vote correctly. As ERFs can readily mitigate rare event bias without requiring large presence sample sizes or imparting considerable balancing bias, they are likely to be a valuable tool in bycatch and species distribution modeling, as well as spatial conservation planning, especially for protected species where presence can be rare.
The Gulf of Mexico reef fish complex is socioeconomically important and is exploited by a vertical line fishery capable of high resolution spatial targeting. Indices of abundance derived from fishery dependent catch-per-unit-effort (CPUE) data are an important input to the assessment of these stocks. Traditionally, these indices have been derived from standardized logbook data, aggregated at a coarse spatial scale, and are limited to generating predictions for observed spatiotemporal strata. Understanding how CPUE is spatially distributed, however, can help identify range contractions and avoid hyperstability or hyperdepletion, both of which can mask the true population dynamics. Vessel monitoring systems (VMS) can provide complete, high-resolution distributions of CPUE used to create abundance indices. Here we compare two methodsspatial averaging of VMS-derived catch and effort data and the result of generalized linear models applied to logbook data for generating indices, to evaluate the use of VMS-derived abundance indices in assessments of reef fish stocks. This work suggests that in fisheries where targeting occurs at very fine spatial scales, abundance indices derived from high-resolution, spatiotemporally complete data may more accurately reflect the underlying dynamics of the stock.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.