This book introduces machine learning for readers with some background in basic linear algebra, statistics, probability, and programming. In a coherent statistical framework it covers a selection of supervised machine learning methods, from the most fundamental (k-NN, decision trees, linear and logistic regression) to more advanced methods (deep neural networks, support vector machines, Gaussian processes, random forests and boosting), plus commonly-used unsupervised methods (generative modeling, k-means, PCA, autoencoders and generative adversarial networks). Careful explanations and pseudo-code are presented for all methods. The authors maintain a focus on the fundamentals by drawing connections between methods and discussing general concepts such as loss functions, maximum likelihood, the bias-variance decomposition, ensemble averaging, kernels and the Bayesian approach along with generally useful tools such as regularization, cross validation, evaluation metrics and optimization methods. The final chapters offer practical advice for solving real-world supervised machine learning problems and on ethical aspects of modern machine learning.
The Duffing oscillator remains a key benchmark in nonlinear systems analysis and poses interesting challenges in nonlinear structural identification. The use of particle methods or sequential Monte Carlo (SMC) is becoming a more common approach for tackling these nonlinear dynamical systems, within structural dynamics and beyond. This paper demonstrates the use of a tailored SMC algorithm within a Markov Chain Monte Carlo (MCMC) scheme to allow inference over the latent states and parameters of the Duffing oscillator in a Bayesian manner. This approach to system identification offers a statistically more rigorous treatment of the problem than the common state-augmentation methods where the parameters of the model are included as additional latent states. It is shown how recent advances in particle MCMC methods, namely the particle Gibbs with ancestor sampling (PG-AS) algorithm is capable of performing efficient Bayesian inference, even in cases where little is known about the system parameters a priori. The advantage of this Bayesian approach is the quantification of uncertainty, not only in the system parameters but also in the states of the model (displacement and velocity) even in the presence of measurement noise.
Histopathological diagnosis of pulmonary tumors is essential for treatment decisions. The distinction between primary lung adenocarcinoma and pulmonary metastasis from the gastrointestinal (GI) tract may be difficult. Therefore, we compared the diagnostic value of several immunohistochemical markers in pulmonary tumors. Tissue microarrays from 629 resected primary lung cancers and 422 resected pulmonary epithelial metastases from various sites (whereof 275 colorectal cancer) were investigated for the immunohistochemical expression of CDH17, GPA33, MUC2, MUC6, SATB2, and SMAD4, for comparison with CDX2, CK20, CK7, and TTF-1. The most sensitive markers for GI origin were GPA33 (positive in 98%, 60%, and 100% of pulmonary metastases from colorectal cancer, pancreatic cancer, and other GI adenocarcinomas, respectively), CDX2 (99/40/100%), and CDH17 (99/0/100%). In comparison, SATB2 and CK20 showed higher specificity, with expression in 5% and 10% of mucinous primary lung adenocarcinomas and both in 0% of TTF-1-negative non-mucinous primary lung adenocarcinomas (25–50% and 5–16%, respectively, for GPA33/CDX2/CDH17). MUC2 was negative in all primary lung cancers, but positive only in less than half of pulmonary metastases from mucinous adenocarcinomas from other organs. Combining six GI markers did not perfectly separate primary lung cancers from pulmonary metastases including subgroups such as mucinous adenocarcinomas or CK7-positive GI tract metastases. This comprehensive comparison suggests that CDH17, GPA33, and SATB2 may be used as equivalent alternatives to CDX2 and CK20. However, no single or combination of markers can categorically distinguish primary lung cancers from metastatic GI tract cancer.
In scientific inference problems, the underlying statistical modeling assumptions have a crucial impact on the end results. There exist, however, only a few automatic means for validating these fundamental modelling assumptions. The contribution in this paper is a general criterion to evaluate the consistency of a set of statistical models with respect to observed data. This is achieved by automatically gauging the models' ability to generate data that is similar to the observed data. Importantly, the criterion follows from the model class itself and is therefore directly applicable to a broad range of inference problems with varying data types, ranging from independent univariate data to high-dimensional time-series. The proposed data consistency criterion is illustrated, evaluated and compared to several well-established methods using three synthetic and two real data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.