Guangming Sheng scite author profile

Guangming Sheng

2Publications

0Citation Statements Received

68Citation Statements Given

How they've been cited

How they cite others

Affiliations

Hefei University of Technology, Chinese University of Hong Kong, University of Hong Kong

Publications

Order By: Most citations

Swift

Zhong

Sheng

Liu³

et al. 2023

View full text Add to dashboard Cite

As the size of deep learning models gets larger and larger, training takes longer time and more resources, making fault tolerance more and more critical. Existing state-of-the-art methods like CheckFreq and Elastic Horovod need to back up a copy of the model state (i.e., parameters and optimizer states) in memory, which is costly for large models and leads to non-trivial overhead. This paper presents SWIFT, a novel recovery design for distributed deep neural network training that significantly reduces the failure recovery overhead without affecting training throughput and model accuracy. Instead of making an additional copy of the model state, SWIFT resolves the inconsistencies of the model state caused by the failure and exploits the replicas of the model state in data parallelism for failure recovery. We propose a logging-based approach when replicas are unavailable, which records intermediate data and replays the computation to recover the lost state upon a failure. The re-computation is distributed across multiple machines to accelerate failure recovery further. We also log intermediate data selectively, exploring the trade-off between recovery time and intermediate data storage overhead. Evaluations show that SWIFT significantly reduces the failure recovery time and achieves similar or better training throughput during failure-free execution compared to state-of-the-art methods without degrading final model accuracy. SWIFT can also achieve up to 1.16x speedup in total training time compared to state-of-the-art methods.

show abstract

Magnetic Charge Model for Leakage Signals from Surface Defects in Ferromagnetic Material

Sheng

Zhang

et al. 2023

Materials

View full text Add to dashboard Cite

A novel three-dimensional theoretical model of magnetic flux leakage (MFL) is proposed in this paper based on the magnetic dipole model. The magnetic dipole model assumes that a ferromagnetic specimen with defects is exposed to a uniform external magnetic field that causes a uniform magnetization around the defect surface. Under this assumption, the MFL can be regarded as arising from magnetic charges on the defect surface. Previous theoretical models were mostly used to analyze simple crack defects such as cylindrical and rectangular cracks. In this paper, we developed a magnetic dipole model for more complex defect shapes such as circular truncated holes, conical holes, elliptical holes, and double-curve-shaped crack holes to complement the existing defect shapes. Experimental results and comparisons with previous models demonstrate that the proposed model provides a better approximation of complex defect shapes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Guangming Sheng

Swift

Magnetic Charge Model for Leakage Signals from Surface Defects in Ferromagnetic Material

Contact Info

Product

Resources

About