The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
Recent excitement over the development of an initiative to generate DNA sequences for all named species on the planet has in our opinion generated two major areas of contention as to how this 'DNA barcoding' initiative should proceed. It is critical that these two issues are clarified and resolved, before the use of DNA as a tool for taxonomy and species delimitation can be universalized. The first issue concerns how DNA data are to be used in the context of this initiative; this is the DNA barcode reader problem (or barcoder problem). Currently, many of the published studies under this initiative have used tree building methods and more precisely distance approaches to the construction of the trees that are used to place certain DNA sequences into a taxonomic context. The second problem involves the reaction of the taxonomic community to the directives of the 'DNA barcoding' initiative. This issue is extremely important in that the classical taxonomic approach and the DNA approach will need to be reconciled in order for the 'DNA barcoding' initiative to proceed with any kind of community acceptance. In fact, we feel that DNA barcoding is a misnomer. Our preference is for the title of the London meetings-Barcoding Life. In this paper we discuss these two concerns generated around the DNA barcoding initiative and attempt to present a phylogenetic systematic framework for an improved barcoder as well as a taxonomic framework for interweaving classical taxonomy with the goals of 'DNA barcoding'.
Oil palm is the most productive oil-bearing crop. Planted on only 5% of the total vegetable oil acreage, palm oil accounts for 33% of vegetable oil, and 45% of edible oil worldwide, but increased cultivation competes with dwindling rainforest reserves. We report the 1.8 gigabase (Gb) genome sequence of the African oil palm Elaeis guineensis, the predominant source of worldwide oil production. 1.535 Gb of assembled sequence and transcriptome data from 30 tissue types were used to predict at least 34,802 genes, including oil biosynthesis genes and homologues of WRINKLED1 (WRI1), and other transcriptional regulators1, which are highly expressed in the kernel. We also report the draft sequence of the S. American oil palm Elaeis oleifera, which has the same number of chromosomes (2n=32) and produces fertile interspecific hybrids with E. guineensis2, but appears to have diverged in the new world. Segmental duplications of chromosome arms define the palaeotetraploid origin of palm trees. The oil palm sequence enables the discovery of genes for important traits as well as somaclonal epigenetic alterations which restrict the use of clones in commercial plantings3, and thus helps achieve sustainability for biofuels and edible oils, reducing the rainforest footprint of this tropical plantation crop.
Persistent infections with carcinogenic human papillomaviruses (HPV) cause virtually all cervical cancers. Cervical HPV types (n > 40) also represent the most common sexually transmitted agents, and most infections clear in 1-2 years. The risks of persistence and neoplastic progression to cancer and its histologic precursor, cervical intraepithelial neoplasia grade 3 (CIN3), differ markedly by HPV type. To study type-specific HPV natural history, we conducted a 10,000-woman, population-based prospective study of HPV infections and CIN3/cancer in Guanacaste, Costa Rica. By studying large numbers of women, we wished to separate viral persistence from neoplastic progression. We observed a strong concordance of newly-revised HPV evolutionary groupings with the separate risks of persistence and progression to CIN3/cancer. HPV16 was uniquely likely both to persist and to cause neoplastic progression when it persisted, making it a remarkably powerful human carcinogen that merits separate clinical consideration. Specifically, 19.9% of HPV16-infected women were diagnosed with CIN3/cancer at enrollment or during the five-year follow-up. Other carcinogenic types, many related to HPV16, were not particularly persistent but could cause neoplastic progression, at lower rates than HPV16, if they did persist. Some low-risk types were persistent but, nevertheless, virtually never caused CIN3. Therefore, carcinogenicity is not strictly a function of persistence. Separately, we noted that the carcinogenic HPV types code for an E5 protein, whereas most low-risk types either lack a definable homologous E5 ORF and/or a translation start codon for E5. These results present several clear clues and research directions in our ongoing efforts to understand HPV carcinogenesis.
We report the sequences of 1,244 human Y chromosomes randomly ascertained from 26 worldwide populations by the 1000 Genomes Project. We discovered more than 65,000 variants, including SNVs, MNVs, indels, STRs, and CNVs. Of these, CNVs contribute the greatest predicted functional impact. We constructed a calibrated phylogenetic tree based on binary SNVs and projected the more complex variants onto it, estimating the numbers of mutations for each class. Our phylogeny reveals bursts of extreme expansions in male numbers that have occurred independently among each of the five continental superpopulations examined, at times of known migrations and technological innovations.
Engaging a century-long debate about the role of race in science
Sequence comparisons were made for up to 667 bp of DNA cloned from 14 kinds of Hawaiian Drosophila and five other dipteran species. These sequences include parts of the genes for NADH dehydrogenase (subunits 1, 2, and 5) and rRNA (from the large ribosomal subunit). Because the times of divergence among these species are known approximately, the sequence comparisons give insight into the evolutionary dynamics of this molecule. Transitions account for nearly all of the differences between sequences that have diverged by less than 2%: for these sequences the mean rate of divergence appears to be about 2%/Myr. In comparisons involving greater divergence times and greater sequence divergence, relatively more of the sequence differences are due to transversions. Specifically, the fraction of these differences that are counted as transversions rises from an initial value of less than 0.1 to a plateau value of nearly 0.6. The time required to reach half of the plateau value, about 10 Myr, is similar to that for mammalian mtDNA. The mtDNAs of flies and mammals are also alike in the shape of the curve relating the percentage of positions at which there are differences in protein-coding regions to the time of divergence. For both groups of animals, the curve has a steep initial slope ascribable to fast accumulation of synonymous substitutions and a shallow final slope resulting from the slow accumulation of substitutions causing amino acid replacements. However, the percentage of all sites that can experience a high rate of substitution appears to be only about 8% for fly mtDNA compared to about 20% for mammalian mtDNA.(ABSTRACT TRUNCATED AT 250 WORDS)
Relationships among representatives of the five major Hawaiian Drosophila species groups were examined using data from eight different gene regions. A simultaneous analysis of these data resulted in a single most-parsimonious tree that (1) places the adiastola picture-winged subgroup as sister taxon to the other picture-winged subgroups, (2) unites the modified-tarsus species group with flies from the Antopocerus species group, and (3) places the white-tip scutellum species group as the most basal taxon. Because of the different gene sources used in this study, numerous process partitions can be erected within this data set. We examined the incongruence among these various partitions and the ramifications of these data for the taxonomic consensus, prior agreement, and simultaneous analysis approaches to phylogenetic reconstruction. Separate analyses and taxonomic consensus appear to be inadequate methods for dealing with the partitions in this study. Although detection of incongruence is possible and helps elucidate particular areas of disagreement among data sets, separation of partitions on the basis of incongruence is problematic for many reasons. First, analyzing all genes separately and then either presenting them all as possible hypotheses or taking their consensus provides virtually no information concerning the relationships among these flies. Second, despite some evidence of incongruence, there are no clear delineations among the various gene partitions that separate only heterogeneous data. Third, to the extent that problematic genes can be identified, these genes have nearly the same information content, within a combined analysis framework, as the remaining nonproblematic genes. Our data suggest that significant incongruence among data partitions may be isolated to specific relationships and the "false" signal creating this incongruence is most likely to be overcome by a simultaneous analysis. We present a new method, partitioned Bremer support, for examining the contribution of a particular data partition to the topological support of the simultaneous analysis tree.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.