The Naïve Overfitting Index Selection (NOIS): A new method to optimize model complexity for hyperspectral data

Rocha, Alby Duarte; Groen, T.A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, L.

doi:10.1016/j.isprsjprs.2017.09.012

Cited by 21 publications

(34 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, unlike traditional empirical models, most machine learning models use iterative learning to reduce overall error and maximize model fit [191]. Depending on the parameterization of the model and the amount of training data available, this approach may lead to over-fitting of the data, especially in models with numerous input variables subject to collinearity such as adjacent hyperspectral bands [192]. To avoid overfitting, machine learning methods require the provision of separate training and testing datasets that contain representative samples of the parameters of interest.…”

Section: Machine Learning Modelsmentioning

confidence: 99%

Research Trends in the Use of Remote Sensing for Inland Water Quality Science: Moving Towards Multidisciplinary Applications

Topp

Pavelsky

Jensen

et al. 2020

Water

200

View full text Add to dashboard Cite

Remote sensing approaches to measuring inland water quality date back nearly 50 years to the beginning of the satellite era. Over this time span, hundreds of peer-reviewed publications have demonstrated promising remote sensing models to estimate biological, chemical, and physical properties of inland waterbodies. Until recently, most of these publications focused largely on algorithm development as opposed to implementation of those algorithms to address specific science questions. This slow evolution contrasts with terrestrial and oceanic remote sensing, where methods development in the 1970s led to publications focused on understanding spatially expansive, complex processes as early as the mid-1980s. This review explores the progression of inland water quality remote sensing from methodological development to scientific applications. We use bibliometric analysis to assess overall patterns in the field and subsequently examine 236 key papers to identify trends in research focus and scale. The results highlight an initial 30 year period where the majority of publications focused on model development and validation followed by a spike in publications, beginning in the early-2000s, applying remote sensing models to analyze spatiotemporal trends, drivers, and impacts of changing water quality on ecosystems and human populations. Recent and emerging resources, including improved data availability and enhanced processing platforms, are enabling researchers to address challenging science questions and model spatiotemporally explicit patterns in water quality. Examination of the literature shows that the past 10–15 years has brought about a focal shift within the field, where researchers are using improved computing resources, datasets, and operational remote sensing algorithms to better understand complex inland water systems. Future satellite missions promise to continue these improvements by providing observational continuity with spatial/spectral resolutions ideal for inland waters.

show abstract

Section: Machine Learning Modelsmentioning

confidence: 99%

Research Trends in the Use of Remote Sensing for Inland Water Quality Science: Moving Towards Multidisciplinary Applications

Topp

Pavelsky

Jensen

et al. 2020

Water

200

View full text Add to dashboard Cite

show abstract

“…Two tuning methods were applied to select model complexity: traditional crossvalidation and a novel method called Naïve Overfitting Index Selection (NOIS) (Rocha et al, 2017). When tuning a model with cross-validation (we used 10fold cross-validation), a model is selected with a complexity that minimises the Root Mean Squared Error (RMSE) of the predictions from the validation subsets (Hastie et al, 2009).…”

Section: Modelling and Performance Assessmentmentioning

confidence: 99%

“…This procedure was randomly repeated ten times, resulting in a combination of 100 subsets of training and validation sets from the original data (James et al, 2013). The NOIS method selects model complexity considering an a priori level of overfitting tolerated by the user (we used 5%; see Rocha et al, 2017(Rocha et al, 2017 for details). The complexity selected for models tuned with cross-validation varied according to the landscape.…”

Section: Modelling and Performance Assessmentmentioning

confidence: 99%

“…Using these supervised methods with a large set of predictors (i.e., the number of spectral bands) in relation to the number of observations is likely to cause model overfitting (Hastie et al, 2009;Rocha et al, 2017). Overfitting occurs when the model incorporates random noises and data structures unrelated to the underlying relationship (James et al, 2013).…”

Section: Introductionmentioning

confidence: 99%

“…For many machine learning algorithms, explicit warnings about such assumptions are missing. However, noisy and autocorrelated data may cause model overfitting and misleading interpretations (Hawkins, 2012;Rocha et al, 2017). Often machine learning algorithms create latent variables to explain residual variance from previously fitted models in a progressive stepwise manner.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Tuning a statistical trade-off between spectral and spatial domains to predict plant traits with hyperspectral remote sensing

Rocha¹

View full text Add to dashboard Cite

Other components of variability related to the spatial alignment between spectra and ground references are errors in plot coordinates, upscale or downscale, distortions to departing from the nadir, among others (Manolakis et al., 2003). The discretisation of continuous domains such as spectra, space or time results in the loss of a certain amount of information (Bruce et al., 2002). The spatial resolution of a remote sensing data (pixel) or the sample The possible solutions for selecting covariates for modelling using hyperspectral data and avoid multicollinearity include: (1) extracting spectral indices that explain causally or empirically the relationship with the target plant trait based on a-priori knowledge; (2) searching a coefficient from a combination of two or more bands that is highest correlated with the plant trait (Darvishzadeh et al., 2008); (3) combining wavelengths to create latent Predictive models for plant traits are mostly selected by data rather than based on theory, and often elected among different regression techniques (James et al., 2013). If the model is assessed with the same data as was fitted, more complexity, directly means more accuracy, as the prediction error always reduces when the complexity increases (James et al., 2013). Consequently, it is improper to assess and report the accuracy of predictive models with the same data as used for selecting the final model. Predictive models require to split the data into training and testing (sub) sets to assess accuracy (Esbensen and Geladi, 2010). There are many alternatives, from splitting an independent This thesis is comprised of six chapters, of which four research chapters are submitted, and three are currently accepted as scientific articles to peerreviewed ISI journals. The general outline is indicated below. Chapter 1: the introductory chapter discusses the importance of plant traits and the role of remote sensing to monitoring and understanding the underlying process. The chapter is designed to highlight issues that need further improvement when modelling plant traits with hyperspectral data. Chapter 2: demonstrates that empirical models using hyperspectral data to predict traits are very likely to lead to significant overfitting, even when selected by commonly used robust cross-validation. A new method named Naïve Overfitting Index Selection (NOIS) was developed to quantify overfitting while selecting model complexity (tuning). The method was tested using five hyperspectral datasets and seven machine learning regression techniques. Chapter 3: shows that machine learning regressions using hyperspectral data are likely to lead to inaccurate predictions when significant autocorrelation is

show abstract

The effects of armed conflict on forest cover changes across temporal and spatial scales in the Colombian Amazon

et al. 2021

Self Cite

View full text Add to dashboard Cite

The Amazon rainforest covers roughly 40% of Colombia’s territory and has important global ecological functions. For more than 50 years, an internal war in the country has shaped this region. Peace negotiations between the government and the Revolutionary Armed Forces of Colombia (FARC) initiated in 2012 resulted in a progressive de-escalation of violence and a complete ceasefire in 2016. This study explores the role of different deforestation drivers including armed conflict variables, in explaining deforestation for three periods between 2001 and 2015. Iterative regression analyses were carried out for two spatial extents: the entire Colombian Amazon and a subset area which was most affected by deforestation. The results show that conflict variables have positive relationships with deforestation; yet, they are not among the main variables explaining deforestation. Accessibility and biophysical variables explain more variation. Nevertheless, conflict variables show divergent influence on deforestation depending on the period and scale of analysis. Based on these results, we develop deforestation risk maps to inform the design of forest conservation efforts in the post-conflict period.

show abstract

The Naïve Overfitting Index Selection (NOIS): A new method to optimize model complexity for hyperspectral data

Cited by 21 publications

References 50 publications

Research Trends in the Use of Remote Sensing for Inland Water Quality Science: Moving Towards Multidisciplinary Applications

Research Trends in the Use of Remote Sensing for Inland Water Quality Science: Moving Towards Multidisciplinary Applications

Tuning a statistical trade-off between spectral and spatial domains to predict plant traits with hyperspectral remote sensing

The effects of armed conflict on forest cover changes across temporal and spatial scales in the Colombian Amazon

Contact Info

Product

Resources

About