Shaoqi Wang scite author profile

Nuclear magnetic resonance (NMR) spectroscopy is highly unbiased and reproducible, which provides us a powerful tool to analyze mixtures consisting of small molecules. However, the compound identification in NMR spectra of mixtures is highly challenging because of chemical shift variations of the same compound in different mixtures and peak overlapping among molecules. Here, we present a pseudo-Siamese convolutional neural network method (pSCNN) to identify compounds in mixtures for NMR spectroscopy. A data augmentation method was implemented for the superposition of several NMR spectra sampled from a spectral database with random noises. The augmented dataset was split and used to train, validate and test the pSCNN model. Two experimental NMR datasets (flavor mixtures and additional flavor mixture) were acquired to benchmark its performance in real applications. The results show that the proposed method can achieve good performances in the augmented test set (ACC = 99.80%, TPR = 99.70% and FPR = 0.10%), the flavor mixtures dataset (ACC = 97.62%, TPR = 96.44% and FPR = 2.29%) and the additional flavor mixture dataset (ACC = 91.67%, TPR = 100.00% and FPR = 10.53%). We have demonstrated that the translational invariance of convolutional neural networks can solve the chemical shift variation problem in NMR spectra. In summary, pSCNN is an off-the-shelf method to identify compounds in mixtures for NMR spectroscopy because of its accuracy in compound identification and robustness to chemical shift variation.

show abstract

Performance Isolation of Data-Intensive Scale-out Applications in a Multi-tenant Cloud

Lama

Wang

Zhou

et al. 2018

View full text Add to dashboard Cite

Scalable Distributed DL Training: Batching Communication and Computation

Wang

Zhou

2019

AAAI

View full text Add to dashboard Cite

Scalability of distributed deep learning (DL) training with parameter server architecture is often communication constrained in large clusters. There are recent efforts that use a layer by layer strategy to overlap gradient communication with backward computation so as to reduce the impact of communication constraint on the scalability. However, the approaches cannot be effectively applied to the overlap between parameter communication and forward computation. In this paper, we propose and design iBatch, a novel communication approach that batches parameter communication and forward computation to overlap them with each other. We formulate the batching decision as an optimization problem and solve it based on greedy algorithm to derive communication and computation batches. We implement iBatch in the open-source DL framework BigDL and perform evaluations with various DL workloads. Experimental results show that iBatch improves the scalability of a cluster of 72 nodes by up to 73% over the default PS and 41% over the layer by layer strategy.

show abstract

Synthesis, inhibitory activity and inhibitory mechanism studies of Schiff base Cu(II) complex as the fourth type urease inhibitors

Chen

Wang

et al. 2019

Inorganic Chemistry Communications

View full text Add to dashboard Cite

Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters

Wang

Chen

et al. 2018

View full text Add to dashboard Cite

Executing distributed machine learning (ML) jobs on Spark follows Bulk Synchronous Parallel (BSP) model, where parallel tasks execute the same iteration at the same time and the generated updates must be synchronized on parameters when all tasks are finished. However, the parallel tasks rarely have the same execution time due to sparse data so that the synchronization has to wait for tasks finished late. Moreover, running Spark on heterogeneous clusters makes it even worse because of stragglers, where the synchronization is significantly delayed by the slowest task. This paper attacks the fundamental BSP model that supports iterative ML jobs. We propose and develop a novel BSP-based Aggressive synchronization (A-BSP) model based on the convergent property of iterative ML algorithms, by allowing the algorithm to use the updates generated based on partial input data for synchronization. Specifically, when the fastest task completes, A-BSP fetches the current updates generated by the rest tasks that have partially processed their input data to push for aggressive synchronization. Furthermore, unprocessed data is prioritized for processing in the subsequent iterations to ensure algorithm convergence rate. Theoretically, we prove the algorithm convergence for gradient descent under A-BSP model. We have implemented A-BSP as a lightweight BSP-compatible mechanism in Spark and performed evaluations with various ML jobs. Experimental results show that compared to BSP, A-BSP speeds up the execution by up to 2.36x. We have also extended A-BSP onto Petuum platform and compared to the Stale Synchronous Parallel (SSP) and Asynchronous Synchronous Parallel (ASP) models. A-BSP performs better than SSP and ASP for gradient descent based jobs. It also outperforms SSP for jobs on physical heterogeneous clusters. 1 INTRODUCTION Bulk Synchronous Parallel (BSP) model provides a simple and easyto-use model for parallel data processing. For example, built on BSP model, Apache Spark [42] has evolved to be a widely used computing platform for distributed processing of large data sets in clusters. It is designed with generality to cover a wide range

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shaoqi Wang

Deep Learning-Based Method for Compound Identification in NMR Spectra of Mixtures

Performance Isolation of Data-Intensive Scale-out Applications in a Multi-tenant Cloud

Scalable Distributed DL Training: Batching Communication and Computation

Synthesis, inhibitory activity and inhibitory mechanism studies of Schiff base Cu(II) complex as the fourth type urease inhibitors

Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters

Contact Info

Product

Resources

About