The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.Oceans cover approximately 71% of the Earth's surface and harbour most of the phylum diversity of the animal kingdom. Understanding marine biodiversity and its evolution remains a major challenge. The Pacific oyster C. gigas (Thunberg, 1793) is a marine bivalve belonging to the phylum Mollusca, which contains the largest number of described marine animal species 1 . Molluscs have vital roles in the functioning of marine, freshwater and terrestrial ecosystems, and have had major effects on humans, primarily as food sources but also as sources of dyes, decorative pearls and shells, vectors of parasites, and biofouling or destructive agents. Many molluscs are important fishery and aquaculture species, as well as models for studying neurobiology, biomineralization, ocean acidification and adaptation to coastal environments under climate change 2,3 . As the most speciose member of the Lophotrochozoa, phylum Mollusca is central to our understanding of the biology and evolution of this superphylum of protostomes.As sessile marine animals living in estuarine and intertidal regions, oysters must cope with harsh and dynamically changing environments. Abiotic factors such as temperature and salinity fluctuate wildly, and toxic metals and desiccation also pose serious challenges. Filter-feeding oysters face tremendous exposure to microbial pathogens. Oysters do have a notable physical line of defence against predation and desiccation in the formation of thick calcified shells, a key evolutionary innovation making molluscs a successful group. However, acidification of the world's oceans by uptake of anthropogenic carbon dioxide poses a potentially serious threat to this ancient adaptation 4 . Understanding biomineralization and molluscan shell formation is, thus, a major area of interest 5 . Crassostrea gigas is also an interesting model for developmental biology owing to its mosaic development with typical molluscan stages, including trochophore and veliger larvae and metamorphosis.A complete genome sequence of C. gigas would enable a more th...
By impairing both function and survival, the severe reduction in oxygen availability associated with high-altitude environments is likely to act as an agent of natural selection. We used genomic and candidate gene approaches to search for evidence of such genetic selection. First, a genome-wide allelic differentiation scan (GWADS) comparing indigenous highlanders of the Tibetan Plateau (3,200-3,500 m) with closely related lowland Han revealed a genome-wide significant divergence across eight SNPs located near EPAS1. This gene encodes the transcription factor HIF2α, which stimulates production of red blood cells and thus increases the concentration of hemoglobin in blood. Second, in a separate cohort of Tibetans residing at 4,200 m, we identified 31 EPAS1 SNPs in high linkage disequilibrium that correlated significantly with hemoglobin concentration. The sex-adjusted hemoglobin concentration was, on average, 0.8 g/dL lower in the major allele homozygotes compared with the heterozygotes. These findings were replicated in a third cohort of Tibetans residing at 4,300 m. The alleles associating with lower hemoglobin concentrations were correlated with the signal from the GWADS study and were observed at greatly elevated frequencies in the Tibetan cohorts compared with the Han. High hemoglobin concentrations are a cardinal feature of chronic mountain sickness offering one plausible mechanism for selection. Alternatively, as EPAS1 is pleiotropic in its effects, selection may have operated on some other aspect of the phenotype. Whichever of these explanations is correct, the evidence for genetic selection at the EPAS1 locus from the GWADS study is supported by the replicated studies associating function with the allelic variants.chronic mountain sickness | high altitude | human genome variation | hypoxia | hypoxia-inducible factor
Bats are the only mammals capable of sustained flight and are notorious reservoir hosts for some of the world's most highly pathogenic viruses, including Nipah, Hendra, Ebola, and severe acute respiratory syndrome (SARS). To identify genetic changes associated with the development of bat-specific traits, we performed whole-genome sequencing and comparative analyses of two distantly related species, fruit bat Pteropus alecto and insectivorous bat Myotis davidii. We discovered an unexpected concentration of positively selected genes in the DNA damage checkpoint and nuclear factor κB pathways that may be related to the origin of flight, as well as expansion and contraction of important gene families. Comparison of bat genomes with other mammalian species has provided new insights into bat biology and evolution.
The naked mole rat (NMR, Heterocephalus glaber) is a strictly subterranean, extraordinarily long-lived eusocial mammal1. Although the size of a mouse, its maximum lifespan exceeds 30 years and makes this animal the longest living rodent. NMRs show negligible senescence, no age-related increase in mortality, and high fecundity until death2. In addition to delayed aging, NMRs are resistant to both spontaneous cancer and experimentally induced tumorigenesis3,4. NMRs pose a challenge to the theories that link aging, cancer and redox homeostasis. Although characterized by significant oxidative stress5, the NMR proteome does not show age-related susceptibility to oxidative damage nor increased ubiquitination6. NMRs naturally reside in large colonies with a single breeding female, the “queen,” who suppresses the sexual maturity of her subordinates11. NMRs also live in full darkness, at low oxygen and high carbon dioxide concentrations7, and are unable to sustain thermogenesis8 nor feel certain types of pain9,10. Here we report sequencing and analysis of the NMR genome, which revealed unique genome features and molecular adaptations consistent with cancer resistance, poikilothermy, hairlessness, altered visual function, circadian rhythms and taste sensing, and insensitivity to low oxygen. This information provides insights into NMR’s exceptional longevity and capabilities to live in hostile conditions, in the dark and at low oxygen. The extreme traits of NMR, together with the reported genome and transcriptome information, offer unprecedented opportunities for understanding aging and advancing many other areas of biological and biomedical research.
Locusts are one of the world’s most destructive agricultural pests and represent a useful model system in entomology. Here we present a draft 6.5 Gb genome sequence of Locusta migratoria, which is the largest animal genome sequenced so far. Our findings indicate that the large genome size of L. migratoria is likely to be because of transposable element proliferation combined with slow rates of loss for these elements. Methylome and transcriptome analyses reveal complex regulatory mechanisms involved in microtubule dynamic-mediated synapse plasticity during phase change. We find significant expansion of gene families associated with energy consumption and detoxification, consistent with long-distance flight capacity and phytophagy. We report hundreds of potential insecticide target genes, including cys-loop ligand-gated ion channels, G-protein-coupled receptors and lethal genes. The L. migratoria genome sequence offers new insights into the biology and sustainable management of this pest species, and will promote its wide use as a model system.
Orally administered drugs must overcome several barriers before reaching their target site. Such barriers depend largely upon specific membrane transport systems and intracellular drug-metabolizing enzymes. For the first time, the P-glycoprotein (P-gp) and cytochrome P450s, the main line of defense by limiting the oral bioavailability (OB) of drugs, were brought into construction of QSAR modeling for human OB based on 805 structurally diverse drug and drug-like molecules. The linear (multiple linear regression: MLR, and partial least squares regression: PLS) and nonlinear (support-vector machine regression: SVR) methods are used to construct the models with their predictivity verified with five-fold cross-validation and independent external tests. The performance of SVR is slightly better than that of MLR and PLS, as indicated by its determination coefficient (R2) of 0.80 and standard error of estimate (SEE) of 0.31 for test sets. For the MLR and PLS, they are relatively weak, showing prediction abilities of 0.60 and 0.64 for the training set with SEE of 0.40 and 0.31, respectively. Our study indicates that the MLR, PLS and SVR-based in silico models have good potential in facilitating the prediction of oral bioavailability and can be applied in future drug design.
To date, most genome-wide association studies (GWAS) and studies of fine-scale population structure have been conducted primarily on Europeans. Han Chinese, the largest ethnic group in the world, composing 20% of the entire global human population, is largely underrepresented in such studies. A well-recognized challenge is the fact that population structure can cause spurious associations in GWAS. In this study, we examined population substructures in a diverse set of over 1700 Han Chinese samples collected from 26 regions across China, each genotyped at approximately 160K single-nucleotide polymorphisms (SNPs). Our results showed that the Han Chinese population is intricately substructured, with the main observed clusters corresponding roughly to northern Han, central Han, and southern Han. However, simulated case-control studies showed that genetic differentiation among these clusters, although very small (F(ST) = 0.0002 approximately 0.0009), is sufficient to lead to an inflated rate of false-positive results even when the sample size is moderate. The top two SNPs with the greatest frequency differences between the northern Han and southern Han clusters (F(ST) > 0.06) were found in the FADS2 gene, which associates with the fatty acid composition in phospholipids, and in the HLA complex P5 gene (HCP5), which associates with HIV infection, psoriasis, and psoriatic arthritis. Ingenuity Pathway Analysis (IPA) showed that most differentiated genes among clusters are involved in cardiac arteriopathy (p < 10(-101)). These signals indicating significant differences among Han Chinese subpopulations should be carefully explained in case they are also detected in association studies, especially when sample sources are diverse.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.