Dongyu Xue scite author profile

A major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity. Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. In addition, DeepCRISPR fully automates the identification of sequence and epigenetic features that may affect sgRNA knockout efficacy in a data-driven manner. DeepCRISPR is available at http://www.deepcrispr.net/.Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1459-4) contains supplementary material, which is available to authorized users.

show abstract

A ‘new lease of life’: FnCpf1 possesses DNA cleavage activity for genome editing in human cells

Lin

Cheng

et al. 2017

102

125

View full text Add to dashboard Cite

Cpf1 nucleases were recently reported to be highly specific and programmable nucleases with efficiencies comparable to those of SpCas9. AsCpf1 and LbCpf1 require a single crRNA and recognize a 5′-TTTN-3′ protospacer adjacent motif (PAM) at the 5′ end of the protospacer for genome editing. For widespread application in precision site-specific human genome editing, the range of sequences that AsCpf1 and LbCpf1 can recognize is limited due to the size of this PAM. To address this limitation, we sought to identify a novel Cpf1 nuclease with simpler PAM requirements. Specifically, here we sought to test and engineer FnCpf1, one reported Cpf1 nuclease (FnCpf1) only requires 5′-TTN-3′ as a PAM but does not exhibit detectable levels of nuclease-induced indels at certain locus in human cells. Surprisingly, we found that FnCpf1 possesses DNA cleavage activity in human cells at multiple loci. We also comprehensively and quantitatively examined various FnCpf1 parameters in human cells, including spacer sequence, direct repeat sequence and the PAM sequence. Our study identifies FnCpf1 as a new member of the Cpf1 family for human genome editing with distinctive characteristics, which shows promise as a genome editing tool with the potential for both research and therapeutic applications.

show abstract

Advances and challenges in deep generative models for de novo molecule generation

Xue

Gong

Yang

et al. 2018

WIREs Comput Mol Sci

View full text Add to dashboard Cite

show abstract

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis

Xue

Zhang

Xiao

et al. 2020

Preprint

View full text Add to dashboard Cite

In silico modelling and analysis of small molecules substantially accelerates the process of drug development. Representing and understanding molecules is the fundamental step for various in silico molecular analysis tasks. Traditionally, these molecular analysis tasks have been investigated individually and separately. In this study, we presented X-MOL, which applies large-scale pre-training technology on 1.1 billion molecules for molecular understanding and representation, and then, carefully designed fine-tuning was performed to accommodate diverse downstream molecular analysis tasks, including molecular property prediction, chemical reaction analysis, drug-drug interaction prediction, de novo generation of molecules and molecule optimization. As a result, X-MOL was proven to achieve state-of-the-art results on all these molecular analysis tasks with good model interpretation ability. Collectively, taking advantage of super large-scale pre-training data and super-computing power, our study practically demonstrated the utility of the idea of “mass makes miracles” in molecular representation learning and downstream in silico molecular analysis, indicating the great potential of using large-scale unlabelled data with carefully designed pre-training and fine-tuning strategies to unify existing molecular analysis tasks and substantially enhance the performance of each task.

show abstract

FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery

Chen

Xue

Chuai

et al. 2020

View full text Add to dashboard Cite

Motivation Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery. Results For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis. A prototype platform of federated-learning-based QSAR modeling for collaborative drug discovery, i.e. FL-QSAR, is presented accordingly. We first compared the HFL framework with a classic privacy-preserving computation framework, i.e. secure multiparty computation to indicate its difference from various perspective. Then we compared FL-QSAR with the public collaboration in terms of QSAR modeling. Our extensive experiments demonstrated that (i) collaboration by FL-QSAR outperforms a single client using only its private data, and (ii) collaboration by FL-QSAR achieves almost the same performance as that of collaboration via cleartext learning algorithms using all shared information. Taking together, our results indicate that FL-QSAR under the HFL framework provides an efficient solution to break the barriers between pharmaceutical institutions in QSAR modeling, therefore promote the development of collaborative and privacy-preserving drug discovery with extendable ability to other privacy-related biomedical areas. Availability and implementation The source codes of FL-QSAR are available on the GitHub: https://github.com/bm2-lab/FL-QSAR. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Structure-informed Language Models Are Protein Designers

Zheng

Deng

Xue

et al. 2023

Preprint

View full text Add to dashboard Cite

This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential evolutionary knowledge from the universe of natural protein sequences, to acquire an immediate capability to design preferable protein sequences for given folds. We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness. During inference, iterative refinement is performed to effectively optimize the generated protein sequences. Experiments show that our approach outperforms the state-of-the-art methods by a large margin, leading to 4% to 12% accuracy gains in sequence recovery (e.g., 55.65% and 56.63% on CATH 4.2 and 4.3 single-chain benchmarks, and >60% when designing protein complexes). We provide extensive and in-depth analyses, which verify that LM-Design can (1) indeed leverage both structural and sequential knowledge to accurately handle structurally non-deterministic regions, (2) benefit from scaling data and model size, and (3) generalize to other proteins (e.g., antibodies and de novo proteins).

show abstract

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis

Xue

Zhang²,

Chen³

et al. 2022

Science Bulletin

View full text Add to dashboard Cite

DeepReac+: deep active learning for quantitative modeling of organic chemical reactions

et al. 2021

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dongyu Xue

DeepCRISPR: optimized CRISPR guide RNA design by deep learning

A ‘new lease of life’: FnCpf1 possesses DNA cleavage activity for genome editing in human cells

Advances and challenges in deep generative models for de novo molecule generation

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis

FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery

Structure-informed Language Models Are Protein Designers

X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis

DeepReac+: deep active learning for quantitative modeling of organic chemical reactions

Contact Info

Product

Resources

About