Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from EPAS1, a transcription factor involved in response to hypoxia. One SNP at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequencing technologies to obtain 72.2-fold genome coverage. The absence of recent whole-genome duplication, along with the presence of few tandem duplications, explains the small number of genes in the cucumber. Our study establishes that five of the cucumber's seven chromosomes arose from fusions of ten ancestral chromosomes after divergence from Cucumis melo. The sequenced cucumber genome affords insight into traits such as its sex expression, disease resistance, biosynthesis of cucurbitacin and 'fresh green' odor. We also identify 686 gene clusters related to phloem function. The cucumber genome provides a valuable resource for developing elite cultivars and for studying the evolution and function of the plant vascular system.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
Particulate matter (PM) air pollution poses a formidable public health threat to the city of Beijing. Among the various hazards of PM pollutants, microorganisms in PM2.5 and PM10 are thought to be responsible for various allergies and for the spread of respiratory diseases. While the physical and chemical properties of PM pollutants have been extensively studied, much less is known about the inhalable microorganisms. Most existing data on airborne microbial communities using 16S or 18S rRNA gene sequencing to categorize bacteria or fungi into the family or genus levels do not provide information on their allergenic and pathogenic potentials. Here we employed metagenomic methods to analyze the microbial composition of Beijing’s PM pollutants during a severe January smog event. We show that with sufficient sequencing depth, airborne microbes including bacteria, archaea, fungi, and dsDNA viruses can be identified at the species level. Our results suggested that the majority of the inhalable microorganisms were soil-associated and nonpathogenic to human. Nevertheless, the sequences of several respiratory microbial allergens and pathogens were identified and their relative abundance appeared to have increased with increased concentrations of PM pollution. Our findings may serve as an important reference for environmental scientists, health workers, and city planners.
Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
BackgroundEstimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.ResultsWe evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.ConclusionsOverall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.