Taxonomic identification of biological materials can be achieved through DNA barcoding, where an unknown “barcode” sequence is compared to a reference database. In many disciplines, obtaining accurate taxonomic identifications can be imperative (
e
.
g
., evolutionary biology, food regulatory compliance, forensics). The Barcode of Life DataSystems (BOLD) and GenBank are the main public repositories of DNA barcode sequences. In this study, an assessment of the accuracy and reliability of sequences in these databases was performed. To achieve this, 1) curated reference materials for plants, macro-fungi and insects were obtained from national collections, 2) relevant barcode sequences (
rbcL
,
matK
,
trnH-psbA
,
ITS
and
COI
) from these reference samples were generated and used for searching against both databases, and 3) optimal search parameters were determined that ensure the best match to the known species in either database. While GenBank outperformed BOLD for species-level identification of insect taxa (53% and 35%, respectively), both databases performed comparably for plants and macro-fungi (~81% and ~57%, respectively). Results illustrated that using a multi-locus barcode approach increased identification success. This study outlines the utility of the BLAST search tool in GenBank and the BOLD identification engine for taxonomic identifications and identifies some precautions needed when using public sequence repositories in applied scientific disciplines.
Rapid evolutionary radiations are expected to require large amounts of sequence data to resolve. To resolve these types of relationships many systematists believe that it will be necessary to collect data by next-generation sequencing (NGS) and use multispecies coalescent ("species tree") methods. Ultraconserved element (UCE) sequence capture is becoming a popular method to leverage the high throughput of NGS to address problems in vertebrate phylogenetics. Here we examine the performance of UCE data for gallopheasants (true pheasants and allies), a clade that underwent a rapid radiation 10-15 Ma. Relationships among gallopheasant genera have been difficult to establish. We used this rapid radiation to assess the performance of species tree methods, using ∼600 kilobases of DNA sequence data from ∼1500 UCEs. We also integrated information from traditional markers (nuclear intron data from 15 loci and three mitochondrial gene regions). Species tree methods exhibited troubling behavior. Two methods [Maximum Pseudolikelihood for Estimating Species Trees (MP-EST) and Accurate Species TRee ALgorithm (ASTRAL)] appeared to perform optimally when the set of input gene trees was limited to the most variable UCEs, though ASTRAL appeared to be more robust than MP-EST to input trees generated using less variable UCEs. In contrast, the rooted triplet consensus method implemented in Triplec performed better when the largest set of input gene trees was used. We also found that all three species tree methods exhibited a surprising degree of dependence on the program used to estimate input gene trees, suggesting that the details of likelihood calculations (e.g., numerical optimization) are important for loci with limited phylogenetic information. As an alternative to summary species tree methods we explored the performance of SuperMatrix Rooted Triple - Maximum Likelihood (SMRT-ML), a concatenation method that is consistent even when gene trees exhibit topological differences due to the multispecies coalescent. We found that SMRT-ML performed well for UCE data. Our results suggest that UCE data have excellent prospects for the resolution of difficult evolutionary radiations, though specific attention may need to be given to the details of the methods used to estimate species trees.
The utility of the forensically important Sarcophagidae (Diptera) for time since death estimates has been severely limited, as morphological identification is difficult and thermobiological histories are inadequately documented. A molecular identification method involving the sequencing of a 658-bp 'barcode' fragment of the mitochondrial cytochrome oxidase subunit I (COI) gene from 85 specimens, representing 16 Australian species from varying populations, was evaluated. Nucleotide sequence divergences were calculated using the Kimura-two-parameter distance model and a neighbour-joining phylogenetic tree generated. All species were resolved as reciprocally monophyletic, except Sarcophaga dux. Intraspecific and interspecific variation ranged from 0.000% to 1.499% (SE = 0.044%) and 6.658% to 8.983% (SE = 0.653%), respectively. The COI 'barcode' sequence was found to be suitable for the molecular identification of the studied Australian Sarcophagidae: 96.5% of the examined specimens were assigned to the correct species. Given that the sarcophagid fauna is poorly described, it is feasible that the few incorrectly assigned specimens represent cryptic species. The results of this research will be instrumental for implementation of the Australian Sarcophagidae in forensic entomology.
The most striking feature of peafowl (Pavo) is the males' elaborate train, which exhibits ocelli (ornamental eyespots) that are under sexual selection. Two additional genera within the Phasianidae (Polyplectron and Argusianus) exhibit ocelli, but the appearance and location of these ornamental eyespots exhibit substantial variation among these genera, raising the question of whether ocelli are homologous. Within Polyplectron, ocelli are ancestral, suggesting ocelli may have evolved even earlier, prior to the divergence among genera. However, it remains unclear whether Pavo, Polyplectron and Argusianus form a monophyletic clade in which ocelli evolved once. We estimated the phylogeny of the ocellated species using sequences from 1966 ultraconserved elements (UCEs) and three mitochondrial regions. The three ocellated genera did form a strongly supported clade, but each ocellated genus was sister to at least one genus without ocelli. Indeed, Polyplectron and Galloperdix, a genus not previously suggested to be related to any ocellated taxon, were sister genera. The close relationship between taxa with and without ocelli suggests multiple gains or losses. Independent gains, possibly reflecting a pre-existing bias for eye-like structures among females and/or the existence of a simple mutational pathway for the origin of ocelli, appears to be the most likely explanation.
Ancestry inference for a person using a panel of SNPs depends on the variation of frequencies of those SNPs around the world and the amount of reference data available for calculation/comparison. The Kidd Lab panel of 55 AISNPs has been incorporated in commercial kits by both Life Technologies and Illumina for massively parallel sequencing. Therefore, a larger set of reference populations will be useful for researchers using those kits. We have added reference population allele frequencies for 52 population samples to the 73 previously entered so that there are now allele frequencies publicly available in ALFRED and FROG-kb for a total of 125 population samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.