Zhize Li scite author profile

Zhize Li

5Publications

82Citation Statements Received

58Citation Statements Given

How they've been cited

How they cite others

Affiliations

Guangdong Polytechnic Normal University, Tsinghua University, Jilin University

Publications

Order By: Most citations

Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference

Zhang

Cheng

et al. 2019

Mach Learn

View full text Add to dashboard Cite

Gradient-based Monte Carlo sampling algorithms, like Langevin dynamics and Hamiltonian Monte Carlo, are important methods for Bayesian inference. In large-scale settings, full-gradients are not affordable and thus stochastic gradients evaluated on mini-batches are used as a replacement. In order to reduce the high variance of noisy stochastic gradients, Dubey et al. [2016] applied the standard variance reduction technique on stochastic gradient Langevin dynamics and obtained both theoretical and experimental improvements. In this paper, we apply the variance reduction tricks on Hamiltonian Monte Carlo and achieve better theoretical convergence results compared with the variance-reduced Langevin dynamics. Moreover, we apply the symmetric splitting scheme in our variance-reduced Hamiltonian Monte Carlo algorithms to further improve the theoretical results. The experimental results are also consistent with the theoretical results. As our experiment shows, variance-reduced Hamiltonian Monte Carlo demonstrates better performance than variance-reduced Langevin dynamics in Bayesian regression and classification tasks on real-world datasets. † denotes equal contribution

show abstract

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

Li¹

2021

Preprint

View full text Add to dashboard Cite

We propose a novel accelerated variance-reduced gradient method called ANITA for finite-sum optimization. In this paper, we consider both general convex and strongly convex settings. In the general convex setting, ANITA achieves the convergence result O n min 1 + log 1 √ n , log √ n + nL , which improves the previous best result O n min{log 1 , log n} + nL given by Varag (Lan et al., 2019). In particular, for a very wide range of , i.e., ∈ (0,, where is the error tolerance f (xT ) − f * ≤ and n is the number of data samples, ANITA can achieve the optimal convergence result O n + nL matching the lower bound Ω n + nL provided by Woodworth and Srebro (2016). To the best of our knowledge, ANITA is the first accelerated algorithm which can exactly achieve this optimal result O n + nL for general convex finite-sum problems. In the strongly convex setting, we also show that ANITA can achieve the optimal convergence result O n + nL µ log 1 matching the lower bound Ω n + nL µ log 1 provided by Lan and Zhou (2015). Moreover, ANITA enjoys a simpler loopless algorithmic structure unlike previous accelerated algorithms such as Katyusha (Allen-Zhu, 2017) and Varag (Lan et al., 2019) where they use an inconvenient double-loop structure. Finally, the experimental results also show that ANITA converges faster than previous state-of-the-art Varag (Lan et al., 2019), validating our theoretical results and confirming the practical superiority of ANITA.

show abstract

Gradient Boosting with Piece-Wise Linear Regression Trees

Shi

2019

View full text Add to dashboard Cite

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.

show abstract

MARINA: Faster Non-Convex Distributed Learning with Compression

Gorbunov¹,

Burlachenko²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

We develop and analyze MARINA: a new communication efficient method for non-convex distributed learning over heterogeneous datasets. MARINA employs a novel communication compression strategy based on the compression of gradient differences which is reminiscent of but different from the strategy employed in the DIANA method of . Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance. To the best of our knowledge, the communication complexity bounds we prove for MARINA are strictly superior to those of all previous first order methods. Further, we develop and analyze two variants of MARINA: VR-MARINA and PP-MARINA. The first method is designed for the case when the local loss functions owned by clients are either of a finite sum or of an expectation form, and the second method allows for partial participation of clients -a feature important in federated learning. All our methods are superior to previous state-of-theart methods in terms of the oracle/communication complexity. Finally, we provide convergence analysis of all methods for problems satisfying the Polyak-Lojasiewicz condition.

show abstract

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Li¹,

Kovalev²,

Qian³

et al. 2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhize Li

Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

Gradient Boosting with Piece-Wise Linear Regression Trees

MARINA: Faster Non-Convex Distributed Learning with Compression

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Contact Info

Product

Resources

About