Machine learning methods originated from artificial intelligence and are now used in various fields in environmental sciences today. This is the first single-authored textbook providing a unified treatment of machine learning methods and their applications in the environmental sciences. Due to their powerful nonlinear modelling capability, machine learning methods today are used in satellite data processing, general circulation models(GCM), weather and climate prediction, air quality forecasting, analysis and modelling of environmental data, oceanographic and hydrological forecasting, ecological modelling, and monitoring of snow, ice and forests. The book includes end-of-chapter review questions and an appendix listing web sites for downloading computer code and data sources. A resources website containing datasets for exercises, and password-protected solutions are available. The book is suitable for first-year graduate students and advanced undergraduates. It is also valuable for researchers and practitioners in environmental sciences interested in applying these new methods to their own work.
Empirical or statistical methods have been introduced into meteorology and oceanography in four distinct stages: 1) linear regression (and correlation), 2) principal component analysis (PCA), 3) canonical correlation analysis, and recently 4) neural network (NN) models. Despite the great popularity of the NN models in many fields, there are three obstacles to adapting the NN method to meteorology-oceanography, especially in large-scale, low-frequency studies: (a) nonlinear instability with short data records, (b) large spatial data fields, and (c) difficulties in interpreting the nonlinear NN results. Recent research shows that these three obstacles can be overcome. For obstacle (a), ensemble averaging was found to be effective in controlling nonlinear instability. For (b), the PCA method was used as a prefilter for compressing the large spatial data fields. For (c), the mysterious hidden layer could be given a phase space interpretation, and spectral analysis aided in understanding the nonlinear NN relations. With these and future improvements, the nonlinear NN method is evolving to a versatile and powerful technique capable of augmenting traditional linear statistical methods in data analysis and forecasting; for example, the NN method has been used for El Niño prediction and for nonlinear PCA. The NN model is also found to be a type of variational (adjoint) data assimilation, which allows it to be readily linked to dynamical models under adjoint data assimilation, resulting in a new class of hybrid neural-dynamical models.
Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño‐Southern Oscillation and the stratospheric quasi‐biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real‐world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.
Abstract-With very noisy data, overfitting is a serious problem in pattern recognition. For nonlinear regression, having plentiful data eliminates overfitting, but for nonlinear principal component analysis (NLPCA), overfitting persists even with plentiful data. Thus simply minimizing mean square error is not a sufficient criterion for NLPCA to find good solutions in noisy data.A new index is proposed which measures the disparity between the nonlinear principal components u andũ for a data point x and its nearest neighbourx. This index, 1 − CS (the Spearman rank correlation between u andũ), tends to increase with overfitted solutions, thereby providing a diagnostic tool to determine how much regularization (i.e. weight penalty) should be used in the objective function of the NLPCA to prevent overfitting. Tests are performed using autoassociative neural networks for NLPCA on synthetic and real climate data. I. INTRODUCTIONIn principal component analysis (PCA), a given dataset is approximated by a straight line, which minimizes the mean square error (MSE) -pictorially, in a scatterplot of the data, the straight line found by PCA passes through the 'middle' of the dataset. In nonlinear PCA (NLPCA), the straight line in PCA is replaced by a curve. NLPCA can be performed by a variety of methods, e.g. the autoassociative neural network (NN) model [6,5], and the kernel PCA model [11].When using nonlinear machine learning methods, the presence of noise in the data can lead to overfitting (i.e. fitting to the noise). When plentiful data are available (i.e. far more samples than model parameters), overfitting is not a problem when performing nonlinear regression on noisy data. Unfortunately, even with plentiful data, overfitting is a problem when applying NLPCA to noisy data [4,2]. As illustrated in Figure 1, overfitting in NLPCA can arise from the geometry of the problem, rather than from the scarsity of data. Here for a Gaussian-distributed data cloud, a nonlinear model with enough flexibility will find the zigzag solution of Figure 1b as having a smaller MSE than the linear solution in Figure 1a. Since the distance between the point A and a, its projection on the NLPCA curve, is smaller in Figure 1b than the corresponding distance in Figure 1a, it is easy to see that the more zigzags there are in the curve, the smaller is the MSE. However, the two neighbouring points A and B, on opposite sides of an "ambiguity" line [8], are projected far apart on the NLPCA curve in Figure 1b. Thus simply searching for the solution which gives the smallest MSE is not a sufficient criterion for NLPCA to find a satisfactory solution in a highly noisy dataset.Regularization (e.g. the addition of weight penalty or decay terms in the objective functions in NN models) has been
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.