Classification of MLH1 Missense VUS Using Protein Structure-Based Deep Learning-Ramachandran Plot-Molecular Dynamics Simulations Method
Benjamin Tam,
Zixin Qin,
Bojin Zhao
et al.
Abstract:Pathogenic variation in DNA mismatch repair (MMR) gene MLH1 is associated with Lynch syndrome (LS), an autosomal dominant hereditary cancer. Of the 3798 MLH1 germline variants collected in the ClinVar database, 38.7% (1469) were missense variants, of which 81.6% (1199) were classified as Variants of Uncertain Significance (VUS) due to the lack of functional evidence. Further determination of the impact of VUS on MLH1 function is important for the VUS carriers to take preventive action. We recently developed a … Show more
“…This visualization method readily highlights unusual conformations by identifying points that fall outside the expected ranges of φ–ψ values, making such diagrams indispensable in protein structure validation . Recently, there has been a resurgence of interest in visualizing dynamic data, particularly in exploring Ramachandran plots within a dynamic context. − On the other hand, to the best of our knowledge, Ramachandran plots have never been used as a tool to directly follow protein conformation changes over time, despite their close connection with protein backbone conformation. This is most likely due to the intrinsic difficulty of constructing plots with a large number of points that are also changing position in time.…”
Extracting meaningful information from atomistic molecular dynamics (MD) simulations of proteins remains a challenging task due to the high-dimensionality and complexity of the data. MD simulations yield trajectories that contain the positions of thousands of atoms in millions of steps. Gaining a comprehensive understanding of local dynamical events across the entire trajectory is often difficult. Here, we present a novel approach to visualize MD trajectories in the form of timedependent Ramachandran plots. Specialized data aggregation techniques are employed to address the challenge of plotting millions of data points on a single image, thereby ensuring that the analysis is independent of the molecule size and/or length of the MD simulation. This approach facilitates quick identification of flexible and dynamic regions, and its strength is the ability to simultaneously observe the movements of all amino acids over time. The Python program MDavocado is freely available at GitHub (https://github.com/zoranstefanic/MDavocado).
“…This visualization method readily highlights unusual conformations by identifying points that fall outside the expected ranges of φ–ψ values, making such diagrams indispensable in protein structure validation . Recently, there has been a resurgence of interest in visualizing dynamic data, particularly in exploring Ramachandran plots within a dynamic context. − On the other hand, to the best of our knowledge, Ramachandran plots have never been used as a tool to directly follow protein conformation changes over time, despite their close connection with protein backbone conformation. This is most likely due to the intrinsic difficulty of constructing plots with a large number of points that are also changing position in time.…”
Extracting meaningful information from atomistic molecular dynamics (MD) simulations of proteins remains a challenging task due to the high-dimensionality and complexity of the data. MD simulations yield trajectories that contain the positions of thousands of atoms in millions of steps. Gaining a comprehensive understanding of local dynamical events across the entire trajectory is often difficult. Here, we present a novel approach to visualize MD trajectories in the form of timedependent Ramachandran plots. Specialized data aggregation techniques are employed to address the challenge of plotting millions of data points on a single image, thereby ensuring that the analysis is independent of the molecule size and/or length of the MD simulation. This approach facilitates quick identification of flexible and dynamic regions, and its strength is the ability to simultaneously observe the movements of all amino acids over time. The Python program MDavocado is freely available at GitHub (https://github.com/zoranstefanic/MDavocado).
Colorectal cancer (CRC) ranks third in terms of cancer incidence worldwide and is responsible for 8% of all deaths globally. Approximately 10% of CRC cases are caused by inherited pathogenic mutations in driver genes involved in pathways that are crucial for CRC tumorigenesis and progression. These hereditary mutations significantly increase the risk of initial benign polyps or adenomas developing into cancer. In recent years, the rapid and accurate sequencing of CRC-specific multigene panels by next-generation sequencing (NGS) technologies has enabled the identification of several recurrent pathogenic variants with established functional consequences. In parallel, rare genetic variants that are not characterized and are, therefore, called variants of uncertain significance (VUSs) have also been detected. The classification of VUSs is a challenging task because each amino acid has specific biochemical properties and uniquely contributes to the structural stability and functional activity of proteins. In this scenario, the ability to computationally predict the effect of a VUS is crucial. In particular, in silico prediction methods can provide useful insights to assess the potential impact of a VUS and support additional clinical evaluation. This approach can further benefit from recent advances in artificial intelligence-based technologies. In this review, we describe the main in silico prediction tools that can be used to evaluate the structural and functional impact of VUSs and provide examples of their application in the analysis of gene variants involved in hereditary CRC syndromes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.