This paper considers the decentralized optimization problem, which has applications in large scale machine learning, sensor networks, and control theory. We propose a novel algorithm that can achieve near optimal communication complexity, matching the known lower bound up to a logarithmic factor of the condition number of the problem. Our theoretical results give affirmative answers to the open problem on whether there exists an algorithm that can achieve a communication complexity (nearly) matching the lower bound depending on the global condition number instead of the local one. Moreover, the proposed algorithm achieves the optimal computation complexity matching the lower bound up to universal constants. Furthermore, to achieve a linear convergence rate, our algorithm doesn't require the individual functions to be (strongly) convex. Our method relies on a novel combination of known techniques including Nesterov's accelerated gradient descent, multi-consensus and gradient-tracking. The analysis is new, and may be applied to other related problems. Empirical studies demonstrate the effectiveness of our method for machine learning applications.
The linear contextual bandits is a sequential decision-making problem where an agent decides among sequential actions given their corresponding contexts. Since large-scale data sets become more and more common, we study the linear contextual bandits in high-dimensional situations. Recent works focus on employing matrix sketching methods to accelerating contextual bandits. However, the matrix approximation error will bring additional terms to the regret bound. In this paper we first propose a novel matrix sketching method which is called Spectral Compensation Frequent Directions (SCFD). Then we propose an efficient approach for contextual bandits by adopting SCFD to approximate the covariance matrices. By maintaining and manipulating sketched matrices, our method only needs O(md) space and O(md) updating time in each round, where d is the dimensionality of the data and m is the sketching size. Theoretical analysis reveals that our method has better regret bounds than previous methods in high-dimensional cases. Experimental results demonstrate the effectiveness of our algorithm and verify our theoretical guarantees.
This paper considers stochastic first-order algorithms for convex-concave minimax problems of the form minx maxy f (x, y), where f can be presented by the average of n individual components which are L-average smooth. For µx-strongly-convex-µy-strongly-concave setting, we propose a new method which could find a ε-saddle point of the problem in Õ n( √ n + κx)( √ n + κy) log(1/ε) stochastic first-order complexity, where κx L/µx and κy L/µy. This upper bound is near optimal with respect to ε, n, κx and κy simultaneously. In addition, the algorithm is easily implemented and works well in practical. Our methods can be extended to solve more general unbalanced convex-concave minimax problems and the corresponding upper complexity bounds are also near optimal.
The frequent directions (FD) technique is a deterministic approach for online sketching that has many applications in machine learning. The conventional FD is a heuristic procedure that often outputs rank deficient matrices. To overcome the rank deficiency problem, we propose a new sketching strategy called robust frequent directions (RFD) by introducing a regularization term. RFD can be derived from an optimization problem. It updates the sketch matrix and the regularization term adaptively and jointly. RFD reduces the approximation error of FD without increasing the computational cost. We also apply RFD to online learning and propose an effective hyperparameterfree online Newton algorithm. We derive a regret bound for our online Newton algorithm based on RFD, which guarantees the robustness of the algorithm. The experimental studies demonstrate that the proposed method outperforms state-of-the-art second order online learning algorithms.
We study the smooth minimax optimization problem of the form minx maxy f (x, y), where the objective function is strongly-concave in y but possibly nonconvex in x. This problem includes a lot of applications in machine learning such as regularized GAN, reinforcement learning and adversarial training. Most of existing theory related to gradient descent accent focus on establishing the convergence result for achieving the first-order stationary point of f (x, y) or primal function P (x) maxy f (x, y). In this paper, we design a new optimization method via cubic Newton iterations, which could find an O ε, κ 1.5 √ ρε -second-order stationary point of P (x) with O κ 1.5 √ ρε −1.5 second-order oracle calls and Õ κ 2 √ ρε −1.5 first-order oracle calls, where κ is the condition number and ρ is the Hessian smoothness coefficient of f (x, y). For high-dimensional problems, we propose an variant algorithm to avoid expensive cost form second-order oracle, which solves the cubic sub-problem inexactly via gradient descent and matrix Chebyshev expansion. This strategy still obtains desired approximate second-order stationary point with high probability but only requires Õ κ 1.5 ℓε −2 Hessian-vector oracle and Õ κ 2 √ ρε −1.5 first-order oracle calls. To the best of our knowledge, this is the first work considers non-asymptotic convergence behavior of finding second-order stationary point for minimax problem without convex-concave assumption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.