Genetic factors modifying the blood metabolome have been investigated through genome-wide association studies (GWAS) of common genetic variants and through exome sequencing. We conducted a whole-genome sequencing study of common, low-frequency and rare variants to associate genetic variations with blood metabolite levels using comprehensive metabolite profiling in 1,960 adults. We focused the analysis on 644 metabolites with consistent levels across three longitudinal data collections. Genetic sequence variations at 101 loci were associated with the levels of 246 (38%) metabolites (P ≤ 1.9 × 10). We identified 113 (10.7%) among 1,054 unrelated individuals in the cohort who carried heterozygous rare variants likely influencing the function of 17 genes. Thirteen of the 17 genes are associated with inborn errors of metabolism or other pediatric genetic conditions. This study extends the map of loci influencing the metabolome and highlights the importance of heterozygous rare variants in determining abnormal blood metabolic phenotypes in adults.
The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Understanding the significance of genetic variants in the noncoding genome is emerging as the next challenge in human genomics. We used the power of 11,257 whole-genome sequences and 16,384 heptamers (7-nt motifs) to build a map of sequence constraint for the human species. This build differed substantially from traditional maps of interspecies conservation and identified regulatory elements among the most constrained regions of the genome. Using new Hi-C experimental data, we describe a strong pattern of coordination over 2 Mb where the most constrained regulatory elements associate with the most essential genes. Constrained regions of the noncoding genome are up to 52-fold enriched for known pathogenic variants as compared to unconstrained regions (21-fold when compared to the genome average). This map of sequence constraint across thousands of individuals is an asset to help interpret noncoding elements in the human genome, prioritize variants and reconsider gene units at a larger scale.
Understanding how enzyme specificity evolves will provide guiding principles for protein engineering and function prediction. The o-succinylbenzoate synthase (OSBS) family is an excellent model system for elucidating these principles because it has many highly divergent amino acid sequences that are <20% identical, and some members have evolved a second function. The OSBS family belongs to the enolase superfamily, members of which use a set of conserved residues to catalyze a wide variety of reactions. These residues are the only conserved residues in the OSBS family, so they are not sufficient to determine reaction specificity. Some enzymes in the OSBS family catalyze another reaction, N-succinylamino acid racemization (NSAR). NSARs cannot be segregated into a separate family because their sequences are highly similar to those of known OSBSs, and many of them have both OSBS and NSAR activities. To determine how such divergent enzymes can catalyze the same reaction and how NSAR activity evolved, we divided the OSBS family into subfamilies and compared the divergence of their active site residues. Correlating sequence conservation with the effects of mutations in Escherichia coli OSBS identified two nonconserved residues (R159 and G288) at which mutations decrease efficiency ≥200-fold. These residues are not conserved in the subfamily that includes NSAR enzymes. The OSBS/NSAR subfamily binds the substrate in a different orientation, eliminating selective pressure to retain arginine and glycine at these positions. This supports the hypothesis that specificity-determining residues have diverged in the OSBS family and provides insight into the sequence changes required for the evolution of NSAR activity.
D-Glucaric acid can be produced as a value-added chemical from biomass through a de novo pathway in Escherichia coli. However, previous studies have identified pH-mediated toxicity at product concentrations of 5 g/L and have also found the eukaryotic myo-inositol oxygenase (MIOX) enzyme to be rate-limiting. We ported this pathway to Saccaromyces cerevisiae, which is naturally acid-tolerant and evaluate a codon-optimized MIOX homologue. We constructed two engineered yeast strains that were distinguished solely by their MIOX gene - either the previous version from Mus musculus or a homologue from Arabidopsis thaliana codon-optimized for expression in S. cerevisiae - in order to identify the rate-limiting steps for D-glucaric acid production both from a fermentative and non-fermentative carbon source. myo-Inositol availability was found to be rate-limiting from glucose in both strains and demonstrated to be dependent on growth rate, whereas the previously used M. musculus MIOX activity was found to be rate-limiting from glycerol. Maximum titers were 0.56 g/L from glucose in batch mode, 0.98 g/L from glucose in fed-batch mode, and 1.6 g/L from glucose supplemented with myo-inositol. Future work focusing on the MIOX enzyme, the interplay between growth and production modes, and promoting aerobic respiration should further improve this pathway.
Genome sequencing has established clinical utility for rare disease diagnosis. While increasing numbers of individuals have undergone elective genome sequencing, a comprehensive study surveying genome-wide disease-associated genes in adults with deep phenotyping has not been reported. Here we report the results of a 3-y precision medicine study with a goal to integrate wholegenome sequencing with deep phenotyping. A cohort of 1,190 adult participants (402 female [33.8%]; mean age, 54 y [range 20 to 89+]; 70.6% European) had whole-genome sequencing, and were deeply phenotyped using metabolomics, advanced imaging, and clinical laboratory tests in addition to family/medical history. Of 1,190 adults, 206 (17.3%) had at least 1 genetic variant with pathogenic (P) or likely pathogenic (LP) assessment that suggests a predisposition of genetic risk. A multidisciplinary clinical team reviewed all reportable findings for the assessment of genotype and phenotype associations, and 137 (11.5%) had genotype and phenotype associations. A high percentage of genotype and phenotype associations (>75%) was observed for dyslipidemia (n = 24), cardiomyopathy, arrhythmia, and other cardiac diseases (n = 42), and diabetes and endocrine diseases (n = 17). A lack of genotype and phenotype associations, a potential burden for patient care, was observed in 69 (5.8%) individuals with P/LP variants. Genomics and metabolomics associations identified 61 (5.1%) heterozygotes with phenotype manifestations affecting serum metabolite levels in amino acid, lipid and cofactor, and vitamin pathways. Our descriptive analysis provides results on the integration of whole-genome sequencing and deep phenotyping for clinical assessments in adults.genomics | advanced imaging | precision medicine | deep phenotyping | metabolomics
The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods.Electronic supplementary materialThe online version of this article (doi:10.1186/s13073-017-0509-y) contains supplementary material, which is available to authorized users.
Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.