We present Interactome INSIDER, a tool to link genomic variant information with structural protein-protein interactomes. Underlying this tool is the application of machine learning to predict protein interaction interfaces for 185,957 protein interactions with previously unresolved interfaces, in human and 7 model organisms, including the entire experimentally determined human binary interactome. Predicted interfaces exhibit similar functional properties as known interfaces, including enrichment for disease mutations and recurrent cancer mutations. Through 2,164 de novo mutagenesis experiments, we show that mutations of predicted and known interface residues disrupt interactions at a similar rate, and much more frequently than mutations outside of predicted interfaces. To spur functional genomic studies, Interactome INSIDER (http://interactomeinsider.yulab.org) enables users to identify whether variants or disease mutations are enriched in known and predicted interaction interfaces at various resolutions. Users may explore known population variants, disease mutations, and somatic cancer mutations, or upload their own set of mutations for this purpose.
A new algorithm and web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from non-functional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D web interface is freely available with all major browsers supported.
Each human genome carries tens of thousands of coding variants. The extent to which this variation is functional and the mechanisms by which they exert their influence remains largely unexplored. To address this gap, we leverage the ExAC database of 60,706 human exomes to investigate experimentally the impact of 2009 missense single nucleotide variants (SNVs) across 2185 protein-protein interactions, generating interaction profiles for 4797 SNV-interaction pairs, of which 421 SNVs segregate at > 1% allele frequency in human populations. We find that interaction-disruptive SNVs are prevalent at both rare and common allele frequencies. Furthermore, these results suggest that 10.5% of missense variants carried per individual are disruptive, a higher proportion than previously reported; this indicates that each individual’s genetic makeup may be significantly more complex than expected. Finally, we demonstrate that candidate disease-associated mutations can be identified through shared interaction perturbations between variants of interest and known disease mutations.
Transfer machinery, niche-specific, and metabolic genes are predictive of gene transfer events across diverse bacterial species.
Studies of the proteome would benefit greatly from methods to directly sequence and digitally quantify proteins and detect posttranslational modifications with single-molecule sensitivity. Here, we demonstrate single-molecule protein sequencing using a dynamic approach in which single peptides are probed in real time by a mixture of dye-labeled N-terminal amino acid recognizers and simultaneously cleaved by aminopeptidases. We annotate amino acids and identify the peptide sequence by measuring fluorescence intensity, lifetime, and binding kinetics on an integrated semiconductor chip. Our results demonstrate the kinetic principles that allow recognizers to identify multiple amino acids in an information-rich manner that enables discrimination of single amino acid substitutions and posttranslational modifications. With further development, we anticipate that this approach will offer a sensitive, scalable, and accessible platform for single-molecule proteomic studies and applications.
Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. We prove that PaQL is as least as expressive as integer linear programming, and therefore, evaluation of package queries is in general NP-hard. Second, we present a fundamental evaluation strategy that combines the capabilities of databases and constraint optimization solvers to derive solutions to package queries. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SKETCHREFINE, a scalable algorithm for package evaluation, with strong approximation guarantees ((1 ± ε) 6 -factor approximation). Finally, we present extensive experiments over real-world and benchmark data. The results demonstrate that SKETCHREFINE is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.
The Alzheimer's Disease Neuroimaging (ADNI) database is an expansive undertaking by government, academia, and industry to pool resources and data on subjects at various stage of symptomatic severity due to Alzheimer's disease. As expected, magnetic resonance imaging is a major component of the project. Full brain images are obtained at every 6-month visit. A range of cognitive tests studying executive function and memory are employed less frequently. Two blood draws (baseline, 6 months) provide samples to measure concentrations of approximately 145 plasma biomarkers. In addition, other diagnostic measurements are performed including PET imaging, cerebral spinal fluid measurements of amyloid-beta and tau peptides, as well as genetic tests, demographics, and vital signs. ADNI data is available upon review of an application. There have been numerous reports of how various processes evolve during AD progression, including alterations in metabolic and neuroendocrine activity, cell survival, and cognitive behavior. Lacking an analytic model at the onset, we leveraged recent advances in machine learning, which allow us to deal with large, non-linear systems with many variables. Of particular note was examining how well binary predictions of future disease states could be learned from simple, non-invasive measurements like those dependent on blood samples. Such measurements make relatively little demands on the time and effort of medical staff or patient. We report findings with recall/ precision/area under the receiver operator curve after application of CART, Random Forest, Gradient Boosting, and Support Vector Machines, Our results show (i) Random Forests and Gradient Boosting work very well with such data, (ii) Prediction quality when applied to relatively easily obtained measurements (Cognitive scores, Genetic Risk and plasma biomarkers) achieve results that are competitive with magnetic resonance techniques. This is by no
Background Host-microbe interactions are crucial for normal physiological and immune system development and are implicated in a variety of diseases, including inflammatory bowel disease (IBD), colorectal cancer (CRC), obesity, and type 2 diabetes (T2D). Despite large-scale case-control studies aimed at identifying microbial taxa or genes involved in pathogeneses, the mechanisms linking them to disease have thus far remained elusive. Results To identify potential pathways through which human-associated bacteria impact host health, we leverage publicly-available interspecies protein-protein interaction (PPI) data to find clusters of microbiome-derived proteins with high sequence identity to known human-protein interactors. We observe differential targeting of putative human-interacting bacterial genes in nine independent metagenomic studies, finding evidence that the microbiome broadly targets human proteins involved in immune, oncogenic, apoptotic, and endocrine signaling pathways in relation to IBD, CRC, obesity, and T2D diagnoses. Conclusions This host-centric analysis provides a mechanistic hypothesis-generating platform and extensively adds human functional annotation to commensal bacterial proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.