VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Lin, Weining; Wells, Jude; Wang, Zeyuan; Orengo, Christine A.; Martin, Andrew C.R.

doi:10.21203/rs.3.rs-3188248/v1

Cited by 3 publications

(4 citation statements)

References 35 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Furthermore, AI can be used to develop in silico methods to predict and simulate biological and chemical spaces. Examples of such approaches are cellular and genetic perturbation modelling (Bunne et al 2023;Prasad et al 2022), gene expression prediction (Avsec et al 2021;Kelley et al 2018;Linder et al 2023), variant effect prediction (Brandes et al 2022;Cheng et al 2023;Frazer et al 2021;Lin et al 2023a), protein structure prediction (Baek et al 2021;Jumper et al 2021;Lin et al 2023b), drug-target interaction prediction Huang et al 2021;Wen et al 2017), and molecular docking simulations for drug design (Corso et al 2023;Gentile et al 2020). When it comes to determining the applicability of AI, we can refer to some guiding principles (Figure 1) that can help us to establish whether introducing AI to solve our problem is sensible.…”

Section: Accepted Manuscriptmentioning

confidence: 99%

AI Approaches for the Discovery and Validation of Drug Targets

Wenteler,

Cabrera,

Wei

et al. 2024

Camb. prisms Precis. med.

View full text Add to dashboard Cite

Artificial intelligence (AI) holds immense promise for accelerating and improving all aspects of drug discovery, not least target discovery and validation. By integrating a diverse range of biological data modalities, AI enables the accurate prediction of drug target properties, ultimately illuminating biological mechanisms of disease and guiding drug discovery strategies. Despite the indisputable potential of AI in drug target discovery, there are many challenges and obstacles yet to be overcome, including dealing with data biases, model interpretability and generalisability, and the validation of predicted drug targets to name a few. By exploring recent advancements in AI, this review showcases current applications of AI for drug target discovery and offers perspectives on the future of AI for the discovery and validation of drug targets, paving the way for the generation of novel and safer pharmaceuticals.

show abstract

Section: Accepted Manuscriptmentioning

confidence: 99%

AI Approaches for the Discovery and Validation of Drug Targets

Wenteler,

Cabrera,

Wei

et al. 2024

Camb. prisms Precis. med.

View full text Add to dashboard Cite

show abstract

“…OpenFold (Ahdritz et al., 2022) and RoseTTAFold (Baek et al., 2021) have similar architecture and performance to AlphaFold and rely on deep MSAs. ESMFold (Z. Lin et al., 2023) and OmegaFold (Wu et al., 2022) are large language model (LLM)–based algorithms that do not use MSAs. Consequently, they have a faster execution than AlphaFold (ESMFold has precalculated structures for 600 million sequences!)…”

Section: Commentarymentioning

confidence: 99%

“…Among the most widely used in silico prediction tools are SIFT (Ng & Henikoff, 2001), PolyPhen‐2 (Adzhubei et al., 2010, 2013), and CADD (Rentzsch et al., 2018). More recent methods utilize advanced deep‐learning techniques (Frazer et al., 2021; Qi et al., 2021), including large language models (Brandes et al., 2022; Lin et al., 2023), to predict the pathogenicity of missense variants with greater accuracy. However, although predicted pathogenicity scores may aid in identifying a driver mutation, they do not elucidate how a variant impacts protein function.…”

Section: Introductionmentioning

confidence: 99%

Leveraging AI Advances and Online Tools for Structure‐Based Variant Analysis

et al. 2023

View full text Add to dashboard Cite

Understanding how a gene variant affects protein function is important in life science, as it helps explain traits or dysfunctions in organisms. In a clinical setting, this understanding makes it possible to improve and personalize patient care. Bioinformatic tools often only assign a pathogenicity score, rather than providing information about the molecular basis for phenotypes. Experimental testing can furnish this information, but this is slow and costly and requires expertise and equipment not available in a clinical setting. Conversely, mapping a gene variant onto the three‐dimensional (3D) protein structure provides a fast molecular assessment free of charge. Before 2021, this type of analysis was severely limited by the availability of experimentally determined 3D protein structures. Advances in artificial intelligence algorithms now allow confident prediction of protein structural features from sequence alone. The aim of the protocols presented here is to enable non‐experts to use databases and online tools to investigate the molecular effect of a genetic variant. The Basic Protocol relies only on the online resources AlphaFold, Protein Structure Database, and UniProt. Alternate Protocols document the usage of the Protein Data Bank, SWISS‐MODEL, ColabFold, and PyMOL for structure‐based variant analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.Basic Protocol: 3D Mapping based on UniProt and AlphaFoldAlternate Protocol 1: Using experimental models from the PDBAlternate Protocol 2: Using information from homology modeling with SWISS‐MODELAlternate Protocol 3: Predicting 3D structures with ColabFoldAlternate Protocol 4: Structure visualization and analysis with PyMOL

show abstract

“…Consequently, the model is able to produce a concise representation of the full protein sequence, without relying on three-dimensional information. This rich and meaningful representations of ESM-2 has aided numerous studies, including protein functional prediction [57], [58], protein structure prediction [55], protein-protein interaction prediction [59], protein multimodal representation [60], and protein design [61], [62].…”

Section: Introductionmentioning

confidence: 99%

Protein Design by Directed Evolution Guided by Large Language Models

Tran,

2023

Preprint

View full text Add to dashboard Cite

Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by a rigorous and resource-intensive process of screening or selecting among a vast range of mutations. By conducting anin silicoscreening of sequence properties, machine learning-guided directed evolution (MLDE) can expedite the optimization process and alleviate the experimental workload. In this work, we propose a general MLDE framework in which we apply recent advancements of Deep Learning in protein representation learning and protein property prediction to accelerate the searching and optimization processes. In particular, we introduce an optimization pipeline that utilizes Large Language Models (LLMs) to pinpoint the mutation hotspots in the sequence and then suggest replacements to improve the overall fitness. Our experiments have shown the superior efficiency and efficacy of our proposed framework in the conditional protein generation, in comparision with traditional searching algorithms, diffusion models, and other generative models. We expect this work will shed a new light on not only protein engineering but also on solving combinatorial problems using data-driven methods. Our implementation is publicly available athttps://github.com/HySonLab/Directed_Evolution.

show abstract

VariPred: Enhancing Pathogenicity Prediction of Missense Variants Using Protein Language Models

Cited by 3 publications

References 35 publications

AI Approaches for the Discovery and Validation of Drug Targets

AI Approaches for the Discovery and Validation of Drug Targets

Leveraging AI Advances and Online Tools for Structure‐Based Variant Analysis

Protein Design by Directed Evolution Guided by Large Language Models

Contact Info

Product

Resources

About