Siqi Sun scite author profile

MotivationProtein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction.MethodThis paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question.ResultsOur method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then.Availabilityhttp://raptorx.uchicago.edu/ContactMap/

show abstract

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

Zhang

et al. 2020

View full text Add to dashboard Cite

We present a large, tunable neural conversational response generation model, DIALOGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent opendomain dialogue systems. * A collaboration between Microsoft Research and Microsoft Dynamics 365 AI Research.

show abstract

Patient Knowledge Distillation for BERT Model Compression

Sun¹,

Cheng²,

Gan³

et al. 2019

444

459

View full text Add to dashboard Cite

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks. However, the high demand for computing resources in training such models hinders their application in practice. In order to alleviate this resource hunger in large-scale model training, we propose a Patient Knowledge Distillation approach to compress an original large model (teacher) into an equally-effective lightweight shallow network (student). Different from previous knowledge distillation methods, which only use the output from the last layer of the teacher network for distillation, our student model patiently learns from multiple intermediate layers of the teacher model for incremental knowledge extraction, following two strategies: (i) PKD-Last: learning from the last k layers; and (ii) PKD-Skip: learning from every k layers. These two patient distillation schemes enable the exploitation of rich information in the teacher's hidden layers, and encourage the student model to patiently learn from and imitate the teacher through a multilayer distillation process. Empirically, this translates into improved results on multiple NLP tasks with significant gain in training efficiency, without sacrificing model accuracy. 1

show abstract

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

Wang

Sun

et al. 2016

Preprint

151

273

View full text Add to dashboard Cite

show abstract

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Zhang

Sun

Galley

et al. 2019

Preprint

133

194

View full text Add to dashboard Cite

On the Spectral Efficiency of Massive MIMO Systems With Low-Resolution ADCs

et al. 2016

View full text Add to dashboard Cite

The low-resolution analog-to-digital convertor (ADC) is a promising solution to significantly reduce the power consumption of radio frequency circuits in massive multiple-input multiple-output (MIMO) systems. In this letter, we investigate the uplink spectral efficiency (SE) of massive MIMO systems with low-resolution ADCs over Rician fading channels, where both perfect and imperfect channel state information are considered. By modeling the quantization noise of low-resolution ADCs as an additive quantization noise, we derive tractable and exact approximation expressions of the uplink SE of massive MIMO with the typical maximal-ratio combining (MRC) receivers. We also analyze the impact of the ADC resolution, the Rician K-factor, and the number of antennas on the uplink SE. Our derived results reveal that the use of low-cost and low-resolution ADCs can still achieve satisfying SE in massive MIMO systems.Index Terms-Analog-to-digital convertor (ADC), massive MIMO, Rician fading channels, spectral efficiency.

show abstract

Analysis of deep learning methods for blind protein contact prediction in CASP12

2017

View full text Add to dashboard Cite

Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L=length). A more advanced implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and co-evolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both 1D and 2D deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method.

show abstract

Hierarchical Graph Network for Multi-hop Question Answering

Fang¹,

Sun²,

Gan³

et al. 2020

100

107

View full text Add to dashboard Cite

In this paper, we present Hierarchical Graph Network (HGN) for multi-hop question answering. To aggregate clues from scattered texts across multiple paragraphs, a hierarchical graph is created by constructing nodes on different levels of granularity (questions, paragraphs, sentences, entities), the representations of which are initialized with pre-trained contextual encoders. Given this hierarchical graph, the initial node representations are updated through graph propagation, and multihop reasoning is performed via traversing through the graph edges for each subsequent sub-task (e.g., paragraph selection, supporting facts extraction, answer prediction). By weaving heterogeneous nodes into an integral unified graph, this hierarchical differentiation of node granularity enables HGN to support different question answering sub-tasks simultaneously. Experiments on the HotpotQA benchmark demonstrate that the proposed model achieves new state of the art, outperforming existing multi-hop QA approaches. 1

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Siqi Sun

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

Patient Knowledge Distillation for BERT Model Compression

Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

On the Spectral Efficiency of Massive MIMO Systems With Low-Resolution ADCs

Analysis of deep learning methods for blind protein contact prediction in CASP12

Hierarchical Graph Network for Multi-hop Question Answering

Contact Info

Product

Resources

About