Randall Balestriero scite author profile

The continuously growing amount of seismic data collected worldwide is outpacing our abilities for analysis, since to date, such datasets have been analyzed in a human-expertintensive, supervised fashion. Moreover, analyses that are conducted can be strongly biased by the standard models employed by seismologists. In response to both of these challenges, we develop a new unsupervised machine learning framework for detecting and clustering seismic signals in continuous seismic records. Our approach combines a deep scattering network and a Gaussian mixture model to cluster seismic signal segments and detect novel structures. To illustrate the power of the framework, we analyze seismic data acquired during the June 2017 Nuugaatsiaq, Greenland landslide. We demonstrate the blind detection and recovery of the repeating precursory seismicity that was recorded before the main landslide rupture, which suggests that our approach could lead to more informative forecasting of the seismic activity in seismogenic areas.

show abstract

Learning in High Dimension Always Amounts to Extrapolation

Balestriero¹,

Pesenti²,

LeCun³

2021

Preprint

View full text Add to dashboard Cite

The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample x whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when x falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional (>100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.

show abstract

Mad Max: Affine Spline Insights Into Deep Learning

Balestriero

Baraniuk

2021

Proc. IEEE

View full text Add to dashboard Cite

Mad Max: Affine Spline Insights into Deep Learning

Balestriero¹,

Baraniuk²

2018

Preprint

View full text Add to dashboard Cite

We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of maxaffine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signaldependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation, we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of A Spline Theory of Deep Learning from ICML 2018.

show abstract

Anatomy of Continuous Mars SEIS and Pressure Data from Unsupervised Learning

Barkaoui

Lognonné

Kawamura

et al. 2021

View full text Add to dashboard Cite

The seismic noise recorded by the Interior Exploration using Seismic Investigations, Geodesy, and Heat Transport (InSight) seismometer (Seismic Experiment for Interior Structure [SEIS]) has a strong daily quasi-periodicity and numerous transient microevents, associated mostly with an active Martian environment with wind bursts, pressure drops, in addition to thermally induced lander and instrument cracks. That noise is far from the Earth’s microseismic noise. Quantifying the importance of nonstochasticity and identifying these microevents is mandatory for improving continuous data quality and noise analysis techniques, including autocorrelation. Cataloging these events has so far been made with specific algorithms and operator’s visual inspection. We investigate here the continuous data with an unsupervised deep-learning approach built on a deep scattering network. This leads to the successful detection and clustering of these microevents as well as better determination of daily cycles associated with changes in the intensity and color of the background noise. We first provide a description of our approach, and then present the learned clusters followed by a study of their origin and associated physical phenomena. We show that the clustering is robust over several Martian days, showing distinct types of glitches that repeat at a rate of several tens per sol with stable time differences. We show that the clustering and detection efficiency for pressure drops and glitches is comparable to or better than manual or targeted detection techniques proposed to date, noticeably with an unsupervised approach. Finally, we discuss the origin of other clusters found, especially glitch sequences with stable time offsets that might generate artifacts in autocorrelation analyses. We conclude with presenting the potential of unsupervised learning for long-term space mission operations, in particular, for geophysical and environmental observatories.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.