Guojing Cong scite author profile

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with K = 1. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of 128 GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.Using an image recognition benchmark, we demonstrate the nice convergence properties of K-AVG in comparison to two popular ASGD implementations: Downpour Dean et al. [2012] and EAMSGD Zhang et al. [2015]. In EAMSGD, global gradient aggregation among learners simulates an elastic force that links the parameters they compute with a center variable stored by the parameter server. In both Downpour and EAMSGD, updates to the central parameter server can also have a K-step delay. On our target platform, when K is small, K-AVG significantly reduces the communication time in comparison to Downpour and EAMSGD while achieving similar training and test accuracies. The training time reduction is up to 50%. When K is large, K-AVG achieves much better training and test accuracies than Downpour and EAMSGD after the same amount of data samples are processed. For example, with 128 GPUs, K-AVG is up to about 7 and 2-6 times faster than Downpour and EAMSGD respectively, and achieves significantly better accuracy. This rest of the paper is organized as follows: In section 2, we introduce the standard assumptions in optimization theory needed to analyze SGD methods and frequently used notations throughout the paper; In section 3, we formally introduce the K-AVG algorithm, and prove its standard convergence results with fixed and diminishing stepsize. Based on the convergence result, we analyze the scalability of K-AVG and investigate the optimal choice of K; In section 4, we present our experimental results to validate our analysis.

show abstract

Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing

Cong

Kodali

Krishnamoorthy³

et al. 2008

View full text Add to dashboard Cite

A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Bader

Cong

2005

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs

Bader

Cong²

2006

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

Minimum spanning tree (MST) is one of the most studied combinatorial problems with practical applications in VLSI layout, wireless communication, and distributed networks, recent problems in biology and medicine such as cancer detection, medical imaging, and proteomics, and national security and bioterrorism such as detecting the spread of toxins through populations in the case of biological/chemical warfare. Most of the previous attempts for improving the speed of MST using parallel computing are too complicated to implement or perform well only on special graphs with regular structure. In this paper we design and implement four parallel MST algorithms (three variations of Borůvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm. In fact, our algorithms also solve the minimum spanning forest problem. We provide an experimental study of our algorithms on symmetric multiprocessors such as IBMs pSeries and Sun's Enterprise servers. Our new implementation achieves good speedups over a wide range of input graphs with regular and irregular structures, including the graphs used by previous parallel MST studies. For example, on an arbitrary random graph with 1M vertices and 20M edges, our new approach achieves a speedup of 5 using 8 processors. The source code for these algorithms is freely available from our web site.

show abstract

On the Architectural Requirements for Efficient Execution of Graph Algorithms

Bader

Cong

Feo³

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guojing Cong

On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization

Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing

A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs

On the Architectural Requirements for Efficient Execution of Graph Algorithms

Contact Info

Product

Resources

About