Yuzhe Zhou scite author profile

Yuzhe Zhou

2Publications

4Citation Statements Received

38Citation Statements Given

How they've been cited

How they cite others

Affiliations

Chinese University of Hong Kong, Shenzhen, Shenzhen Research Institute of Big Data

Publications

Order By: Most citations

Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation

Wang

Wei

Zhou

et al. 2022

View full text Add to dashboard Cite

Motivation Protein secondary structure prediction (PSSP) is one of the fundamental and challenging problems in the field of computational biology. Accurate PSSP relies on sufficient homologous protein sequences to build the multiple sequence alignment (MSA). Unfortunately, many proteins lack homologous sequences, which results in the low quality of MSA and poor performance. In this paper, we propose the novel DSM-Distil to tackle this issue, which takes advantage of the pretrained BERT and exploits the knowledge distillation on the newly designed dynamic scoring matrix (DSM) features. Specifically, we propose the dynamic scoring matrix (DSM) to replace the widely used profile and PSSM features. DSM could automatically dig for the suitable feature for each residue, based on the original profile. Namely, DSM-Distil not only could adapt to the low homologous proteins but also is compatible with high homologous ones. Thanks to the dynamic property, DSM could adapt to the input data much better and achieve higher performance. Moreover, to compensate for low-quality MSA, we propose to generate the pseudo-DSM from a pretrained BERT model and aggregate it with the original DSM by adaptive residue-wise fusion, which helps to build richer and more complete input features. In addition, we propose to supervise the learning of low-quality DSM features by using high-quality ones. To achieve this, a novel teacher-student model is designed to distill the knowledge from proteins with high homologous sequences to that of low ones. Combining all the proposed methods, our model achieves the new state-of-the-art performance for low homologous proteins. Results Compared with the previous state-of-the-art method “Bagging”, DSM-Distil achieves an improvement about 5% and 7.3% improvement for proteins with MSA count ≤ 30 and extremely low homologous cases respectively. We also compare DSM-Distil with Alphafold2 which is a state-of-the-art framework for protein structure prediction. DSM-Distil outperforms Alphafold2 by 4.1% on extremely low-quality MSA on 8-state secondary structure prediction. Moreover, we release a large-scale up-to-date test dataset BC40 for low-quality MSA structure prediction evaluation. Availability and implementation BC40 dataset: https://drive.google.com/drive/folders/15vwRoOjAkhhwfjDk6-YoKGf4JzZXIMC HardCase dataset: https://drive.google.com/drive/folders/1BvduOr2b7cObUHy6GuEWk-aUkKJgzTUv Code: https://github.com/qinwang-ai/DSM-Distil

show abstract

Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation

Wang

Chen

Zhou

et al. 2022

AAAI

View full text Add to dashboard Cite

Accurate protein contact map prediction (PCMP) is essential for precise protein structure estimation and further biological studies. Recent works achieve significant performance on this task with high quality multiple sequence alignment (MSA). However, the PCMP accuracy drops dramatically while only poor MSA (e.g., absolute MSA count less than 10) is available. Therefore, in this paper, we propose the Contact-Distil to improve the low homologous PCMP accuracy through knowledge distillation on a self-supervised model. Particularly, two pre-trained transformers are exploited to learn the high quality and low quality MSA representation in parallel for the teacher and student model correspondingly. Besides, the co-evolution information is further extracted from pure sequence through a pretrained ESM-1b model, which provides auxiliary knowledge to improve student performance. Extensive experiments show Contact-Distil outperforms previous state-of-the-arts by large margins on CAMEO-L dataset for low homologous PCMP, i.e., around 13.3% and 9.5% improvements against Alphafold2 and MSA Transformer respectively when MSA count less than 10.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuzhe Zhou

Prior knowledge facilitates low homologous protein secondary structure prediction with DSM distillation

Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation

Contact Info

Product

Resources

About