Definitions, methods, and applications in interpretable machine learning

Murdoch, William J.; Singh, Chandan Deep; Kumbier, Karl; Abbasi-Asl, Reza; Yu, Bin

doi:10.1073/pnas.1900654116

Cited by 1,271 publications

(828 citation statements)

References 65 publications

Supporting

Mentioning

723

Contrasting

Unclassified

Order By: Relevance

“…Finally, ML and DL approaches are "black box" with limited process-based interpretation. Integrating a process-based model with data-driven approaches could not only attain interpretable ML/DL models but, more importantly, are computational efficiency and readily extrapolate outside the range of training conditions [18,91], which is recommended for future large-scale yield estimation, management optimization, and disaster monitoring.…”

Section: Uncertainties In the Studymentioning

confidence: 99%

Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Zhang

Luo

et al. 2019

Remote Sensing

View full text Add to dashboard Cite

Maize is an extremely important grain crop, and the demand has increased sharply throughout the world. China contributes nearly one-fifth of the total production alone with its decreasing arable land. Timely and accurate prediction of maize yield in China is critical for ensuring global food security. Previous studies primarily used either visible or near-infrared (NIR) based vegetation indices (VIs), or climate data, or both to predict crop yield. However, other satellite data from different spectral bands have been underutilized, which contain unique information on crop growth and yield. In addition, although a joint application of multi-source data significantly improves crop yield prediction, the combinations of input variables that could achieve the best results have not been well investigated. Here we integrated optical, fluorescence, thermal satellite, and environmental data to predict county-level maize yield across four agro-ecological zones (AEZs) in China using a regression-based method (LASSO), two machine learning (ML) methods (RF and XGBoost), and deep learning (DL) network (LSTM). The results showed that combining multi-source data explained more than 75% of yield variation. Satellite data at the silking stage contributed more information than other variables, and solar-induced chlorophyll fluorescence (SIF) had an almost equivalent performance with the enhanced vegetation index (EVI) largely due to the low signal to noise ratio and coarse spatial resolution. The extremely high temperature and vapor pressure deficit during the reproductive period were the most important climate variables affecting maize production in China. Soil properties and management factors contained extra information on crop growth conditions that cannot be fully captured by satellite and climate data. We found that ML and DL approaches definitely outperformed regression-based methods, and ML had more computational efficiency and easier generalizations relative to DL. Our study is an important effort to combine multi-source remote sensed and environmental data for large-scale yield prediction. The proposed methodology provides a paradigm for other crop yield predictions and in other regions.

show abstract

Section: Uncertainties In the Studymentioning

confidence: 99%

Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Zhang

Luo

et al. 2019

Remote Sensing

View full text Add to dashboard Cite

show abstract

“…α * p = i α i and θ * p,k = i α i θ LOO i,k . Second, traverse the tree bottom-up to calculate the gradients for each internal node level by level (line [10][11][12][13][14][15][16]. Last, return the cost.…”

Section: Algorithm Descriptionmentioning

confidence: 99%

“…So, the problem of making a single tree perform well in inference arises, and one can ask does a single decision tree beat a random forest with 10 trees. Moreover, trees also serve as one of the few global models considered to be interpretable, an increasingly important requirement in applications [12]. Thus, quality single decision tree built efficiently have many uses.…”

Section: Introductionmentioning

confidence: 99%

Hierarchical Gradient Smoothing for Probability Estimation Trees

Zhang

Petitjean

Buntine

2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Decision trees are still seeing use in online, non-stationary and embedded contexts, as well as for interpretability. For applications like ranking and cost-sensitive classification, probability estimation trees (PETs) are used. These are built using smoothing or calibration techniques. Older smoothing techniques used counts local to a leaf node, but a few more recent techniques consider the broader context of a node when doing estimation. We apply a recent advanced smoothing method called Hierarchical Dirichlet Process (HDP) to PETs, and then propose a novel hierarchical smoothing approach called Hierarchical Gradient Smoothing (HGS) as an alternative. HGS smooths leaf nodes up to all the ancestors, instead of recursively smoothing to the parent used by HDP. HGS is made faster by efficiently optimizing the Leave-One-Out Cross-Validation (LOOCV) loss measure using gradient descent, instead of sampling used in HDP. An extensive set of experiments are conducted on 143 datasets showing that our HGS estimates are not only more accurate but also do so within a fraction of HDP time. Besides, HGS makes a single tree almost as good as a Random Forest with 10 trees. For applications that require more interpretability and efficiency, a single decision tree plus HGS is more preferred.

show abstract

“…Though there are several 26 different available implementations of this overall idea, the principles are 27 similar [1, [21][22][23]: tractometry begins by delineating the parts of the white matter that 28 belong to different major "tracts" (i.e. anatomical or functional groups of white matter 29 fibers), such as the corticospinal tract or arcuate fasciculus, assigning tractography 30 generated streamlines to "bundles," which approximate the anatomical tracts, and 31 sampling biophysical properties (such as fractional anisotropy or mean diffusivity) along 32 the length of these bundles. the parts of the white matter that belong to different major 33 tracts (i.e.…”

mentioning

confidence: 99%

“…Different 50 approaches can be taken to resolving this challenge. For example, Colby and descriptive power [29,30]. Accordingly, tractometry analysis should simultaneously 66 capitalize on all the data across all tracts to make the best possible prediction, while 67 also retaining and elucidating spatial information about the locations that are most 68 informative for a prediction.…”

mentioning

confidence: 99%

Multidimensional analysis and detection of informative features in diffusion MRI measurements of human white matter

Richie-Halford

Yeatman

Simon

et al. 2019

Preprint

View full text Add to dashboard Cite

The white matter contains long-range connections between different brain regions and the organization of these connections holds important implications for brain function in health and disease. Tractometry uses diffusion-weighted magnetic resonance imaging (dMRI) data to quantify tissue properties (e.g. fractional anisotropy (FA), mean diffusivity (MD), etc.), along the trajectories of these connections [1]. Statistical inference from tractometry usually either (a) averages these quantities along the length of each bundle in each individual, or (b) performs analysis point-by-point along each bundle, with group comparisons or regression models computed separately for each point along every one of the bundles. These approaches are limited in their sensitivity, in the former case, or in their statistical power, in the latter. In the present work, we developed a method based on the sparse group lasso (SGL) [2] that takes into account tissue properties measured along all of the bundles, and selects informative features by enforcing sparsity, not only at the level of individual bundles, but also across the entire set of bundles and all of the measured tissue properties. The sparsity penalties for each of these constraints is identified using a nested cross-validation scheme that guards against over-fitting and simultaneously identifies the correct level of sparsity. We demonstrate the accuracy of the method in two settings: i) In a classification setting, patients with amyotrophic lateral sclerosis (ALS) are accurately distinguished from matched controls [3]. Furthermore, SGL automatically identifies FA in the corticospinal tract as important for this classification -correctly finding the parts of the white matter known to be affected by the disease. ii) In a regression setting, dMRI is used to accurately predict "brain age" [4,5]. In this case, the weights are distributed throughout the white matter indicating that many different regions of the white matter change with development and contribute to the prediction of age. Thus, SGL makes it possible to leverage the multivariate relationship between diffusion properties measured along multiple bundles to make accurate predictions of subject characteristics while simultaneously discovering the most relevant features of the white matter for the characteristic of interest. Introduction 1 Diffusion-weighted Magnetic Resonance Imaging (dMRI) provides a unique view into 2 the physical properties of the connections that comprise the brain white matter. While 3 the measurements are usually conducted with voxels at the millimeter scale, water 4 molecules within each voxel diffuse with characteristic lengths at the micrometer scale, 5 providing aggregate information about the physical structure of the white matter, 6 including the density of axons and distribution of fiber orientations within each voxel [6]. 7 Even though metrics derived from diffusion measurements are ambiguous in terms of 8 their underlying biological interpretation [7], analyzing the variance in these properti...

show abstract

Definitions, methods, and applications in interpretable machine learning

Cited by 1,271 publications

References 65 publications

Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Combining Optical, Fluorescence, Thermal Satellite, and Environmental Data to Predict County-Level Maize Yield in China Using Machine Learning Approaches

Hierarchical Gradient Smoothing for Probability Estimation Trees

Multidimensional analysis and detection of informative features in diffusion MRI measurements of human white matter

Contact Info

Product

Resources

About