Machine learning-based drug discovery success depends
on molecular
representation. Yet traditional molecular fingerprints omit both the
protein and pointers back to structural information that would enable
better model interpretability. Therefore, we propose LUNA, a Python
3 toolkit that calculates and encodes protein–ligand interactions
into new hashed fingerprints inspired by Extended Connectivity FingerPrint
(ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional
Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP).
LUNA also provides visual strategies to make the fingerprints interpretable.
We performed three major experiments exploring the fingerprints’
use. First, we trained machine learning models to reproduce DOCK3.7
scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R
2 = 0.61)
superior to related molecular and interaction fingerprints. Second,
we used LUNA to support interpretable machine learning models. Finally,
we demonstrate that interaction fingerprints can accurately identify
similarities across molecular complexes that other fingerprints overlook.
Hence, we envision LUNA and its interface fingerprints as promising
methods for machine learning-based virtual screening campaigns. LUNA
is freely available at .
Machine learning-based drug discovery success depends on molecular representation. Yet traditional molecular fingerprints omit both the protein and pointers back to structural information that would enable better model interpretability. Therefore, we propose LUNA, a Python 3 toolkit that calculates and encodes protein-ligand interactions into new hashed fingerprints inspired by Extended Connectivity Finger-Print (ECFP): EIFP (Extended Interaction FingerPrint), FIFP (Functional Interaction FingerPrint), and Hybrid Interaction FingerPrint (HIFP). LUNA also provides visual strategies to make the fingerprints interpretable. We performed three major experiments exploring the fingerprints’ use. First, we trained machine learning models to reproduce DOCK3.7 scores using 1 million docked Dopamine D4 complexes. We found that EIFP-4,096 performed (R2 = 0.61) superior to related molecular and interaction fingerprints. Secondly, we used LUNA to support interpretable machine learning models. Finally, we demonstrate that interaction fingerprints can accurately identify similarities across molecular complexes that other fingerprints over-look. Hence, we envision LUNA and its interface fingerprints as promising methods for machine learning-based virtual screening campaigns. LUNA is freely available at https://github.com/keiserlab/LUNA.
In this paper, we propose an interactive visualization called VERMONT which tackles the problem of visualizing mutations and infers their possible effects on the conservation of physicochemical and topological properties in protein families. More specifically, we visualize a set of structure-based sequence alignments and integrate several structural parameters that should aid biologists in gaining insight into possible consequences of mutations. VERMONT allowed us to identify patterns of position-specific properties as well as exceptions that may help predict whether specific mutations could damage protein function.
BackgroundA huge amount of data about genomes and sequence variation is available and continues to grow on a large scale, which makes experimentally characterizing these mutations infeasible regarding disease association and effects on protein structure and function. Therefore, reliable computational approaches are needed to support the understanding of mutations and their impacts. Here, we present VERMONT 2.0, a visual interactive platform that combines sequence and structural parameters with interactive visualizations to make the impact of protein point mutations more understandable.ResultsWe aimed to contribute a novel visual analytics oriented method to analyze and gain insight on the impact of protein point mutations. To assess the ability of VERMONT to do this, we visually examined a set of mutations that were experimentally characterized to determine if VERMONT could identify damaging mutations and why they can be considered so.ConclusionsVERMONT allowed us to understand mutations by interpreting position-specific structural and physicochemical properties. Additionally, we note some specific positions we believe have an impact on protein function/structure in the case of mutation.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1789-3) contains supplementary material, which is available to authorized users.
Background: Interactions between proteins and non-proteic small molecule ligands play important roles in the biological processes of living systems. Thus, the development of computational methods to support our understanding of the ligand-receptor recognition process is of fundamental importance since these methods are a major step towards ligand prediction, target identification, lead discovery, and more. This article presents visGReMLIN, a web server that couples a graph mining-based strategy to detect motifs at the protein-ligand interface with an interactive platform to visually explore and interpret these motifs in the context of protein-ligand interfaces. Results: To illustrate the potential of visGReMLIN, we conducted two cases in which our strategy was compared with previous experimentally and computationally determined results. visGReMLIN allowed us to detect patterns previously documented in the literature in a totally visual manner. In addition, we found some motifs that we believe are relevant to protein-ligand interactions in the analyzed datasets. Conclusions: We aimed to build a visual analytics-oriented web server to detect and visualize common motifs at the protein-ligand interface. visGReMLIN motifs can support users in gaining insights on the key atoms/residues responsible for protein-ligand interactions in a dataset of complexes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.