Cyril Zhang scite author profile

We give a polynomial-time algorithm for learning latent-state linear dynamical systems without system identification, and without assumptions on the spectral radius of the system's transition matrix. The algorithm extends the recently introduced technique of spectral filtering, previously applied only to systems with a symmetric transition matrix, using a novel convex relaxation to allow for the efficient identification of phases.

show abstract

Machine Learning for Mechanical Ventilation Control

Suo

Zhang

Gradu

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider the problem of controlling an invasive mechanical ventilator for pressure-controlled ventilation: a controller must let air in and out of a sedated patient’s lungs according to a trajectory of airway pressures specified by a clinician.Hand-tuned PID controllers and similar variants have comprised the industry standard for decades, yet can behave poorly by over- or under-shooting their target or oscillating rapidly.We consider a data-driven machine learning approach: First, we train a simulator based on data we collect from an artificial lung. Then, we train deep neural network controllers on these simulators. We show that our controllers are able to track target pressure waveforms significantly better than PID controllers.We further show that a learned controller generalizes across lungs with varying characteristics much more readily than PID controllers do.

show abstract

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Barak¹,

Edelman²,

Goel³

et al. 2022

Preprint

View full text Add to dashboard Cite

Disentangling Adaptive Gradient Methods from Learning Rates

Agarwal¹,

Anil²,

Hazan³

et al. 2020

Preprint

View full text Add to dashboard Cite

We investigate several confounding factors in the evaluation of optimization algorithms for deep learning. Primarily, we take a deeper look at how adaptive gradient methods interact with the learning rate schedule, a notoriously difficult-to-tune hyperparameter which has dramatic effects on the convergence and generalization of neural network training. We introduce a "grafting" experiment which decouples an update's magnitude from its direction, finding that many existing beliefs in the literature may have arisen from insufficient isolation of the implicit schedule of step sizes. Alongside this contribution, we present some empirical and theoretical retrospectives on the generalization of adaptive gradient methods, aimed at bringing more clarity to this space.

show abstract

Not-So-Random Features

Bullins¹,

Zhang²,

Zhang³

2017

Preprint

View full text Add to dashboard Cite

We propose a principled method for kernel learning, which relies on a Fourier-analytic characterization of translation-invariant or rotation-invariant kernels. Our method produces a sequence of feature maps, iteratively refining the SVM margin. We provide rigorous guarantees for optimality and generalization, interpreting our algorithm as online equilibrium-finding dynamics in a certain two-player min-max game. Evaluations on synthetic and real-world datasets demonstrate scalability and consistent improvements over related random features-based methods.

show abstract

Machine Learning for Mechanical Ventilation Control

Suo¹,

Agarwal²,

Xia³

et al. 2021

Preprint

View full text Add to dashboard Cite

We consider the problem of controlling an invasive mechanical ventilator for pressure-controlled ventilation: a controller must let air in and out of a sedated patient's lungs according to a trajectory of airway pressures specified by a clinician. Hand-tuned PID controllers and similar variants have comprised the industry standard for decades, yet can behave poorly by over-or under-shooting their target or oscillating rapidly. We consider a data-driven machine learning approach: First, we train a simulator based on data we collect from an artificial lung. Then, we train deep neural network controllers on these simulators. We show that our controllers are able to track target pressure waveforms significantly better than PID controllers. We further show that a learned controller generalizes across lungs with varying characteristics much more readily than PID controllers do.

show abstract

Inductive Biases and Variable Creation in Self-Attention Mechanisms

Edelman¹,

Goel²,

Kakade³

et al. 2021

Preprint

View full text Add to dashboard Cite

Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. This work provides a theoretical analysis of the inductive biases of self-attention modules, where our focus is to rigorously establish which functions and long-range dependencies self-attention blocks prefer to represent. Our main result shows that bounded-norm Transformer layers create sparse variables: they can represent sparse functions of the input sequence, with sample complexity scaling only logarithmically with the context length. Furthermore, we propose new experimental protocols to support this analysis and to guide the practice of training Transformers, built around the large body of work on provably learning sparse Boolean functions.

show abstract

Understanding Contrastive Learning Requires Incorporating Inductive Biases

Saunshi¹,

Ash²,

Goel³

et al. 2022

Preprint

View full text Add to dashboard Cite

Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of augmentations and the value of contrastive loss of representations. We demonstrate that such analyses, that ignore inductive biases of the function class and training algorithm, cannot adequately explain the success of contrastive learning, even provably leading to vacuous guarantees in some settings. Extensive experiments on image and text domains highlight the ubiquity of this problem -different function classes and algorithms behave very differently on downstream tasks, despite having the same augmentations and contrastive losses. Theoretical analysis is presented for the class of linear representations, where incorporating inductive biases of the function class allows contrastive learning to work with less stringent conditions compared to prior analyses.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cyril Zhang

Spectral Filtering for General Linear Dynamical Systems

Machine Learning for Mechanical Ventilation Control

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Disentangling Adaptive Gradient Methods from Learning Rates

Not-So-Random Features

Machine Learning for Mechanical Ventilation Control

Inductive Biases and Variable Creation in Self-Attention Mechanisms

Understanding Contrastive Learning Requires Incorporating Inductive Biases

Contact Info

Product

Resources

About