DNA barcoding and DNA taxonomy have recently been proposed as solutions to the crisis of taxonomy and received significant attention from scientific journals, grant agencies, natural history museums, and mainstream media. Here, we test two key claims of molecular taxonomy using 1333 mitochondrial COI sequences for 449 species of Diptera. We investigate whether sequences can be used for species identification ("DNA barcoding") and find a relatively low success rate (< 70%) based on tree-based and newly proposed species identification criteria. Misidentifications are due to wide overlap between intra- and interspecific genetic variability, which causes 6.5% of all query sequences to have allospecific or a mixture of allo- and conspecific (3.6%) best-matching barcodes. Even when two COI sequences are identical, there is a 6% chance that they belong to different species. We also find that 21% of all species lack unique barcodes when consensus sequences of all conspecific sequences are used. Lastly, we test whether DNA sequences yield an unambiguous species-level taxonomy when sequence profiles are assembled based on pairwise distance thresholds. We find many sequence triplets for which two of the three pairwise distances remain below the threshold, whereas the third exceeds it; i.e., it is impossible to consistently delimit species based on pairwise distances. Furthermore, for species profiles based on a 3% threshold, only 47% of all profiles are consistent with currently accepted species limits, 20% contain more than one species, and 33% only some sequences from one species; i.e., adopting such a DNA taxonomy would require the redescription of a large proportion of the known species, thus worsening the taxonomic impediment. We conclude with an outlook on the prospects of obtaining complete barcode databases and the future use of DNA sequences in a modern integrative taxonomy.
We present SequenceMatrix, software that is designed to facilitate the assembly and analysis of multi‐gene datasets. Genes are concatenated by dragging and dropping FASTA, NEXUS, or TNT files with aligned sequences into the program window. A multi‐gene dataset is concatenated and displayed in a spreadsheet; each sequence is represented by a cell that provides information on sequence length, number of indels, the number of ambiguous bases (“Ns”), and the availability of codon information. Alternatively, GenBank numbers for the sequences can be displayed and exported. Matrices with hundreds of genes and taxa can be concatenated within minutes and exported in TNT, NEXUS, or PHYLIP formats, preserving both character set and codon information for TNT and NEXUS files. SequenceMatrix also creates taxon sets listing taxa with a minimum number of characters or gene fragments, which helps assess preliminary datasets. Entire taxa, whole gene fragments, or individual sequences for a particular gene and species can be excluded from export. Data matrices can be re‐split into their component genes and the gene fragments can be exported as individual gene files. SequenceMatrix also includes two tools that help to identify sequences that may have been compromised through laboratory contamination or data management error. One tool lists identical or near‐identical sequences within genes, while the other compares the pairwise distance pattern of one gene against the pattern for all remaining genes combined. SequenceMatrix is Java‐based and compatible with the Microsoft Windows, Apple MacOS X and Linux operating systems. The software is freely available from http://code.google.com/p/sequencematrix/. © The Willi Hennig Society 2010.
Hymenoptera (sawflies, wasps, ants, and bees) are one of four mega-diverse insect orders, comprising more than 153,000 described and possibly up to one million undescribed extant species [1, 2]. As parasitoids, predators, and pollinators, Hymenoptera play a fundamental role in virtually all terrestrial ecosystems and are of substantial economic importance [1, 3]. To understand the diversification and key evolutionary transitions of Hymenoptera, most notably from phytophagy to parasitoidism and predation (and vice versa) and from solitary to eusocial life, we inferred the phylogeny and divergence times of all major lineages of Hymenoptera by analyzing 3,256 protein-coding genes in 173 insect species. Our analyses suggest that extant Hymenoptera started to diversify around 281 million years ago (mya). The primarily ectophytophagous sawflies are found to be monophyletic. The species-rich lineages of parasitoid wasps constitute a monophyletic group as well. The little-known, species-poor Trigonaloidea are identified as the sister group of the stinging wasps (Aculeata). Finally, we located the evolutionary root of bees within the apoid wasp family "Crabronidae." Our results reveal that the extant sawfly diversity is largely the result of a previously unrecognized major radiation of phytophagous Hymenoptera that did not lead to wood-dwelling and parasitoidism. They also confirm that all primarily parasitoid wasps are descendants of a single endophytic parasitoid ancestor that lived around 247 mya. Our findings provide the basis for a natural classification of Hymenoptera and allow for future comparative analyses of Hymenoptera, including their genomes, morphology, venoms, and parasitoid and eusocial life styles.
Abstract. The dipteran clade Calyptratae is comprised of approximately 18 000 described species (12% of the known dipteran diversity) and includes well-known taxa such as houseflies, tsetse flies, blowflies and botflies, which have a close association with humans. However, the phylogenetic relationships within this insect radiation are very poorly understood and controversial. Here we propose a higher-level phylogenetic hypothesis for the Calyptratae based on an extensive DNA sequence dataset for 11 noncalyptrate outgroups and 247 calyptrate species representing all commonly accepted families in the Oestroidea and Hippoboscoidea, as well as those of the muscoid grade. DNA sequences for genes in the mitochondrial (12S, 16S, cytochrome c oxidase subunit I and cytochrome b) and nuclear genome [18S, 28S, the carbamoyl phosphate synthetase region of CAD (rudimentary), Elongation factor one alpha] were used to reconstruct the relationships. We discuss problems relating to the alignment and analysis of large datasets and emphasize the advantages of utilizing a guide treebased approach for the alignment of the DNA sequences and using the leaf stability index to identify 'wildcard' taxa whose excessive instability obscures the phylogenetic signal. Our analyses support the monophyly of the Calyptratae and demonstrate that the superfamily Oestroidea is nested within the muscoid grade. We confirm that the monotypic family Mystacinobiidae is an oestroid and further revise the composition of the Oestroidea by demonstrating that the previously unplaced and still undescribed 'McAlpine's fly' is nested within this superfamily as a probable sister group to Mystacinobiidae. Within the Oestroidea we confirm with molecular data that the Calliphoridae are a paraphyletic grade of lineages. The families Sarcophagidae and Rhiniidae are monophyletic, but support for the monophyly of Tachinidae and Rhinophoridae depends on analytical technique (e.g. parsimony or maximum likelihood). The superfamilies Hippoboscoidea and Oestroidea are consistently found to be monophyletic, and the paraphyly of the muscoid grade is confirmed. In the overall relationships for the calyptrates, the Hippoboscoidea are sister group to the remaining Calyptratae, and the Fanniidae are sister group to the nonhippoboscoid calyptrates, whose relationships can be summarized as (Muscidae (Oestroidea (Scathophagidae, Anthomyiidae))).
The gene expression pattern specified by an animal regulatory sequence is generally viewed as arising from the particular arrangement of transcription factor binding sites it contains. However, we demonstrate here that regulatory sequences whose binding sites have been almost completely rearranged can still produce identical outputs. We sequenced the even-skipped locus from six species of scavenger flies (Sepsidae) that are highly diverged from the model species Drosophila melanogaster, but share its basic patterns of developmental gene expression. Although there is little sequence similarity between the sepsid eve enhancers and their well-characterized D. melanogaster counterparts, the sepsid and Drosophila enhancers drive nearly identical expression patterns in transgenic D. melanogaster embryos. We conclude that the molecular machinery that connects regulatory sequences to the transcription apparatus is more flexible than previously appreciated. In exploring this diverse collection of sequences to identify the shared features that account for their similar functions, we found a small number of short (20–30 bp) sequences nearly perfectly conserved among the species. These highly conserved sequences are strongly enriched for pairs of overlapping or adjacent binding sites. Together, these observations suggest that the local arrangement of binding sites relative to each other is more important than their overall arrangement into larger units of cis-regulatory function.
Here we present evidence, based on 10 datasets comprising 5283 sequences for 200 genera, that the use of the Kimura‐2‐parameter (K2P) model in DNA‐barcoding studies is poorly justified. We demonstrate that K2P is neither expected nor confirmed to be an appropriate model for closely related COI sequences. In addition, we show that the use of uncorrected distances yields higher or similar identification success rates for neighbour‐joining trees and distance‐based identification techniques. K2P also does not widen the barcoding gap for closely related sequences. We conclude that the spread of K2P through the barcoding literature is difficult to explain, and urge the use of evidence‐based approaches to DNA barcoding. © The Willi Hennig Society 2011.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.