Summary1. The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and BLAST searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. 2. We introduce ITSx, a Perl-based software tool to extract ITS1, 5.8S and ITS2 -as well as full-length ITS sequences -from both Sanger and high-throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. 3. ITSx has a very high proportion of true-positive extractions and a low proportion of false-positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines. 4. ITSx paves the way for more sensitive BLAST searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non-ITS sequences from any data set. This is particularly useful for amplicon-based next-generation sequencing data sets, where insidious non-target sequences are often found among the target sequences. Such non-target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.
High-throughput sequencing technologies are currently revolutionizing the field of biology and medicine, yet bioinformatic challenges in analysing very large data sets have slowed the adoption of these technologies by the community of population biologists. We introduce the 'Simple Fool's Guide to Population Genomics via RNA-seq' (SFG), a document intended to serve as an easy-to-follow protocol, walking a user through one example of high-throughput sequencing data analysis of nonmodel organisms. It is by no means an exhaustive protocol, but rather serves as an introduction to the bioinformatic methods used in population genomics, enabling a user to gain familiarity with basic analysis steps. The SFG consists of two parts. This document summarizes the steps needed and lays out the basic themes for each and a simple approach to follow. The second document is the full SFG, publicly available at http://sfg.stanford.edu, that includes detailed protocols for data processing and analysis, along with a repository of custom-made scripts and sample files. Steps included in the SFG range from tissue collection to de novo assembly, blast annotation, alignment, gene expression, functional enrichment, SNP detection, principal components and F(ST) outlier analyses. Although the technical aspects of population genomics are changing very quickly, our hope is that this document will help population biologists with little to no background in high-throughput sequencing and bioinformatics to more quickly adopt these new techniques.
Global climate change is projected to accelerate during the next century, altering oceanic patterns in temperature, pH and oxygen concentrations. Documenting patterns of genetic adaptation to these variables in locations that currently experience geographic variation in them is an important tool in understanding the potential for natural selection to allow populations to adapt as climate change proceeds. We sequenced the mantle transcriptome of 39 red abalone (Haliotis rufescens) individuals from three regions (Monterey Bay, Sonoma, north of Cape Mendocino) distinct in temperature, aragonite saturation, exposure to hypoxia and disease pressure along the California coast. Among 1.17 × 10(6) Single Nucleotide Polymorphisms (SNPs) identified in this study (1.37% of the transcriptome), 21 579 could be genotyped for all individuals. A principal components analysis concluded that the vast majority of SNPs show no population structure from Monterey, California to the Oregon border, in corroboration with several previous studies. In contrast, an FST outlier analysis indicated 691 SNPs as exhibiting significantly higher than expected differentiation (experiment-wide P < 0.05). From these, it was possible to identify 163 genes through BLAST annotation, 34 of which contained more than one outlier SNP. A large number of these genes are involved in biomineralization, energy metabolism, heat-, disease- or hypoxia-tolerance. These genes are candidate loci for spatial adaptation to geographic variation that is likely to increase in the future.
With the rapid increase in production of genetic data from new sequencing technologies, a myriad of new ways to study genomic patterns in nonmodel organisms are currently possible. Because genome assembly still remains a complicated procedure, and because the functional role of much of the genome is unclear, focusing on SNP genotyping from expressed sequences provides a cost-effective way to reduce complexity while still retaining functionally relevant information. This review summarizes current methods, identifies ways that using expressed sequence data benefits population genomic inference and explores how current practitioners evaluate and overcome challenges that are commonly encountered. We focus particularly on the additional power of functional analysis provided by expressed sequence data and how these analyses push beyond allele pattern data available from nonfunction genomic approaches. The massive data sets generated by these approaches create opportunities and problems as well -especially false positives. We discuss methods available to validate results from expressed SNP genotyping assays, new approaches that sidestep use of mRNA and review followup experiments that can focus on evolutionary mechanisms acting across the genome.
The level of integration between associated partners can range from ectosymbioses to extracellular and intracellular endosymbioses, and this range has been assumed to reflect a continuum from less intimate to evolutionarily highly stable associations. In this study, we examined the specificity and evolutionary history of marine symbioses in a group of closely related sulphur-oxidizing bacteria, called Candidatus Thiosymbion, that have established ecto-and endosymbioses with two distantly related animal phyla, Nematoda and Annelida. Intriguingly, in the ectosymbiotic associations of stilbonematine nematodes, we observed a high degree of congruence between symbiont and host phylogenies, based on their ribosomal RNA (rRNA) genes. In contrast, for the endosymbioses of gutless phallodriline annelids (oligochaetes), we found only a weak congruence between symbiont and host phylogenies, based on analyses of symbiont 16S rRNA genes and six host genetic markers. The much higher degree of congruence between nematodes and their ectosymbionts compared to those of annelids and their endosymbionts was confirmed by cophylogenetic analyses. These revealed 15 significant codivergence events between stilbonematine nematodes and their ectosymbionts, but only one event between gutless phallodrilines and their endosymbionts. Phylogenetic analyses of 16S rRNA gene sequences from 50 Cand. Thiosymbion species revealed seven well-supported clades that contained both stilbonematine ectosymbionts and phallodriline endosymbionts. This closely coupled evolutionary history of marine ecto-and endosymbionts suggests that switches between symbiotic lifestyles and between the two host phyla occurred multiple times during the evolution of the Cand. Thiosymbion clade, and highlights the remarkable flexibility of these symbiotic bacteria.
The internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit holds a central position in the pursuit of the taxonomic affiliation of fungi recovered through environmental sampling. Newly generated fungal ITS sequences are typically compared against the International Nucleotide Sequence Databases for a species or genus name using the sequence similarity software suite blast. Such searches are not without complications however, and one of them is the presence of chimeric entries among the query or reference sequences. Chimeras are artificial sequences, generated unintentionally during the polymerase chain reaction step, that feature sequence data from two (or possibly more) distinct species. Available software solutions for chimera control do not readily target the fungal ITS region, but the present study introduces a blast-based open source software package (available at http://www.emerencia.org/chimerachecker.html) to examine newly generated fungal ITS sequences for the presence of potentially chimeric elements in batch mode. We used the software package on a random set of 12 300 environmental fungal ITS sequences in the public sequence databases and found 1.5% of the entries to be chimeric at the ordinal level after manual verification of the results. The proportion of chimeras in the sequence databases can be hypothesized to increase as emerging sequencing technologies drawing from pooled DNA samples are becoming important tools in molecular ecology research.
BackgroundDespite recent work to characterize gene expression changes associated with larval development in oysters, the mechanism by which the larval shell is first formed is still largely unknown. In Crassostrea gigas, this shell forms within the first 24 h post fertilization, and it has been demonstrated that changes in water chemistry can cause delays in shell formation, shell deformations and higher mortality rates. In this study, we use the delay in shell formation associated with exposure to CO2-acidified seawater to identify genes correlated with initial shell deposition.ResultsBy fitting linear models to gene expression data in ambient and low aragonite saturation treatments, we are able to isolate 37 annotated genes correlated with initial larval shell formation, which can be categorized into 1) ion transporters, 2) shell matrix proteins and 3) protease inhibitors. Clustering of the gene expression data into co-expression networks further supports the result of the linear models, and also implies an important role of dynein motor proteins as transporters of cellular components during the initial shell formation process.ConclusionsUsing an RNA-Seq approach with high temporal resolution allows us to identify a conceptual model for how oyster larval calcification is initiated. This work provides a foundation for further studies on how genetic variation in these identified genes could affect fitness of oyster populations subjected to future environmental changes, such as ocean acidification.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-4519-y) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.