We developed an extensive database of landscape metrics for ~2.65 million stream segments, and their associated catchments, within the conterminous United States (U.S.): The Stream‐Catchment (StreamCat) Dataset. These data are publically available (http://www2.epa.gov/national-aquatic-resource-surveys/streamcat) and greatly reduce the specialized geospatial expertise needed by researchers and managers to acquire landscape information for both catchments (i.e., the nearby landscape flowing directly into streams) and full upstream watersheds of specific stream reaches. When combined with an existing geospatial framework of the Nation's rivers and streams (National Hydrography Dataset Plus Version 2), the distribution of catchment and watershed characteristics can be visualized for the conterminous U.S. In this article, we document the development and main features of this dataset, including the suite of landscape features that were used to develop the data, scripts and algorithms used to accumulate and produce watershed summaries of landscape features, and the quality assurance procedures used to ensure data consistency. The StreamCat Dataset provides an important tool for stream researchers and managers to understand and characterize the Nation's rivers and streams.
Random forest (RF) modeling has emerged as an important statistical learning method in ecology due to its exceptional predictive performance. However, for large and complex ecological data sets, there is limited guidance on variable selection methods for RF modeling. Typically, either a preselected set of predictor variables are used or stepwise procedures are employed which iteratively remove variables according to their importance measures. This paper investigates the application of variable selection methods to RF models for predicting probable biological stream condition. Our motivating data set consists of the good/poor condition of n = 1365 stream survey sites from the 2008/2009 National Rivers and Stream Assessment, and a large set (p = 212) of landscape features from the StreamCat data set as potential predictors. We compare two types of RF models: a full variable set model with all 212 predictors and a reduced variable set model selected using a backward elimination approach. We assess model accuracy using RF's internal out-of-bag estimate, and a cross-validation procedure with validation folds external to the variable selection process. We also assess the stability of the spatial predictions generated by the RF models to changes in the number of predictors and argue that model selection needs to consider both accuracy and stability. The results suggest that RF modeling is robust to the inclusion of many variables of moderate to low importance. We found no substantial improvement in cross-validated accuracy as a result of variable reduction. Moreover, the backward elimination procedure tended to select too few variables and exhibited numerous issues such as upwardly biased out-of-bag accuracy estimates and instabilities in the spatial predictions. We use simulations to further support and generalize results from the analysis of real data. A main purpose of this work is to elucidate issues of model selection bias and instability to ecologists interested in using RF to develop predictive models with large environmental data sets.
We used water d 2 H and d 18 O from ca. 1000 lakes sampled in the 2007 U.S. Environmental Protection Agency's National Lakes Assessment (NLA) to assess two hydrological variables-evaporation as a percentage of inflow (E : I) and water residence time (t) for summer 2007. Using a population survey design, sampled lakes were distributed across the conterminous U.S., and results were scaled to the inference population (,50,000 U.S. lakes). These hydrologic variables were related to lake nutrients and biological condition to illustrate their usefulness in national water quality monitoring efforts. For 50% of lakes, evaporation was , 25% of inflow, with values ranging up to 113% during the 2007 summer. Residence time was , 0.52 yr for half of the lakes and , 1.12 yr for 75% of lakes. Categorizing lakes by flow regime, 66.1% of lakes were flow-though lakes (60% or more of the water flows through the lake, E : I , 0.4), 33.6% were restricted-basin lakes (40% or more of the lake inflow evaporates, 0.4 , E : I , 1), and , 0.3% were closed basin (all water entering the lake leaves through evaporation, E : I . 1). While climate patterns drove some of the spatial patterns of E : I and t, variation in lake depth and watershed size (influencing precipitation volume) were also significant drivers. Lake hydrochemistry was strongly correlated to E : I and more weakly related to t. Lakes in poor biological condition (based on a predictive model of planktonic taxa) were significantly more evaporated than lakes in good biological condition.
The lack of a clear framework identifying data to link ecosystems to analyses of human well‐being has been highlighted in numerous studies. To address this issue, we applied a recently developed economic theory termed “final” ecosystem goods and services – the biophysical features and qualities that people perceive as being directly related to their well‐being. The six‐step process presented here enabled us to identify metrics associated with streams that can be used in the analysis of human well‐being; we illustrate these steps with data from a regional stream survey. Continued refinement and application of this framework will require ongoing collaboration between natural and social scientists. Framework application could result in more useful and relevant data, leading to more informed decisions in the management of ecosystems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.