Microsatellites have been popular molecular markers ever since their advent in the late eighties. Despite growing competition from new genotyping and sequencing techniques, the use of these versatile and cost-effective markers continues to increase, boosted by successive technical advances. First, methods for multiplexing PCR have considerably improved over the last years, thereby decreasing genotyping costs and increasing throughput. Second, next-generation sequencing technologies allow the identification of large numbers of microsatellite loci at reduced cost in non-model species. As a consequence, more stringent selection of loci is possible, thereby further enhancing multiplex quality and efficiency. However, current practices are lagging behind. By surveying recently published population genetic studies relying on simple sequence repeats, we show that more than half of the studies lack appropriate quality controls and do not make use of multiplex PCR. To make the most of the latest technical developments, we outline the need for a well-established strategy including standardized high-throughput bench protocols and specific bioinformatic tools, from primer design to allele calling.
The recent emergence of barcoding approaches coupled to those of next-generation sequencing (NGS) has raised new perspectives for studying environmental communities. In this framework, we tested the possibility to derive accurate inventories of diatom communities from pyrosequencing outputs with an available DNA reference library. We used three molecular markers targeting the nuclear, chloroplast and mitochondrial genomes (SSU rDNA, rbcL and cox1) and three samples of a mock community composed of 30 known diatom strains belonging to 21 species. In the goal to detect methodological biases, one sample was constituted directly from pooled cultures, whereas the others consisted of pooled PCR products. The NGS reads obtained by pyrosequencing (Roche 454) were compared first to a DNA reference library including the sequences of all the species used to constitute the mock community, and second to a complete DNA reference library with a larger taxonomic coverage. A stringent taxonomic assignation gave inventories that were compared to the real one. We detected biases due to DNA extraction and PCR amplification that resulted in false-negative detection. Conversely, pyrosequencing errors appeared to generate false positives, especially in case of closely allied species. The taxonomic coverage of DNA reference libraries appears to be the most crucial factor, together with marker polymorphism which is essential to identify taxa at the species level. RbcL offers a high resolving power together with a large DNA reference library. Although needing further optimization, pyrosequencing is suitable for identifying diatom assemblages and may find applications in the field of freshwater biomonitoring.
Two unigene datasets of Pinus taeda and Pinus pinaster were screened to detect di-, tri- and tetranucleotide repeated motifs using the SSRIT script. A total of 419 simple sequence repeats (SSRs) were identified, from which only 12.8% overlapped between the two sets. The position of the SSRs within their coding sequences were predicted using FrameD. Trinucleotides appeared to be the most abundant repeated motif (63 and 51% in P. taeda and P. pinaster, respectively) and tended to be found within translated regions (76% in both species), whereas dinucleotide repeats were preferentially found within the 5'- and 3'-untranslated regions (75 and 65%, respectively). Fifty-three primer pairs amplifying a single PCR fragment in the source species (mainly P. taeda), were tested for amplification in six other pine species. The amplification rate with other pine species was high and corresponded with the phylogenetic distance between species, varying from 64.6% in P. canariensis to 94.2% in P. radiata. Genomic SSRs were found to be less transferable; 58 of the 107 primer pairs (i.e. 54%) derived from P. radiata amplified a single fragment in P. pinaster. Nine cDNA-SSRs were located to their chromosomes in two P. pinaster linkage maps. The level of polymorphism of these cDNA-SSRs was compared to that of previously and newly developed genomic-SSRs. Overall, genomic SSRs tend to perform better in terms of heterozygosity and number of alleles. This study suggests that useful SSR markers can be developed from pine ESTs.
Wood is one of our most important natural resources. Surprisingly, we know hardly anything about the details of the process of wood formation. The aim of this work was to describe the main proteins expressed in wood forming tissue of a conifer species (Pinus pinaster Ait.). Using high resolution 2-DE with linear pH gradient ranging from 4 to 7, a total of 1039 spots were detected. Out of the 240 spots analyzed by MS/MS, 67.9% were identified, 16.7% presented no homology in the databases, and 15.4% corresponded to protein mixtures. Out of the 57 spots analyzed by MALDI-MS, only 15.8% were identified. Most of the 175 identified proteins play a role in either defense (19.4%), carbohydrates (16.6%) and amino acid (14.9%) metabolisms, genes and proteins expression (13.1%), cytoskeleton (8%), cell wall biosynthesis (5.7%), secondary (5.1%) and primary (4%) metabolisms. A summary of the identified proteins, their putative functions, and behavior in different types of wood are presented. This information was introduced into the PROTICdb database and is accessible at http://cbib1.cbib.u-bordeaux2.fr/Protic/Protic/home/index.php. Finally, the average protein amount was compared with their respective transcript abundance as quantified through EST counting in a cDNA-library constructed with mRNA extracted from wood forming tissue.
BackgroundSingle nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe.ResultsWe designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species.ConclusionsOur results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers.
Diatoms are main bioindicators used to assess the ecological quality of rivers, but their identification is difficult and time-consuming. Next Generation Sequencing (NGS) can be used to study communities of microorganisms, so we carried out a test of the reliability of 454 pyrosequencing for estimating diatom inventories in environmental samples. We used small subunit ribosomal deoxyribonucleic acid (SSU rDNA), ribulose-1, 5-bisphosphate carboxylase (rbcL), and cytochrome oxidase I (COI) markers and examined reference libraries to define thresholds between the intra-and interspecific and intra-and intergeneric genetic distances. Based on tests of 1 mock community, we used a threshold of 99% identity for SSU rDNA and rbcL sequences to study freshwater diatoms at the species level. We applied 454 pyrosequencing to 4 contrasting environmental samples (with one in duplicate), assigned taxon names to environmental sequences, and compared the qualitative and quantitative molecular inventories to those obtained by microscopy. Species richness detected by microscopy was always higher than that detected by pyrosequencing. Some morphologically detected taxa may have been persistent frustules from dead cells. Some taxa detected by molecular analysis were not detected by morphology and vice versa. The main source of divergence appears to be inadequate taxonomic coverage in DNA reference libraries. Only a small percentage of species (but almost all genera) in morphological inventories were included in DNA reference libraries. DNA reference libraries contained a smaller percentage of species from tropical (27.1-38.1%) than from temperate samples (53.7-77.8%). Agreement between morphological and molecular inventories was better for species with relative abundance >1% than for rare species. The rbcL marker appeared to provide more reproducible results (94.9% species similarity between the 2 duplicates) and was very useful for molecular identification, but procedural standardization is needed. The water-quality ranking assigned to a site via the Pollution Sensitivity diatom index was the same whether calculated with molecular or morphological data. Pyrosequencing is a promising approach for detecting all species, even rare ones, once reference libraries have been developed.
Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the number of barcodes and diatom taxa. In addition to these information, morphological features (e.g. biovolumes, chloroplasts…), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom.Database URL: http://www.rsyst.inra.fr/
We developed an automated pipeline for the detection of single nucleotide polymorphisms (SNPs) in expressed sequence tag (EST) data sets, by combining three DNA sequence analysis programs: Phred, Phrap and PolyBayes. This application requires access to the individual electrophoregram traces. First, a reference set of 65 SNPs was obtained from the sequencing of 30 gametes in 13 maritime pine (Pinus pinaster Ait.) gene fragments (6671 bp), resulting in a frequency of 1 SNP every 102.6 bp. Second, parameters of the three programs were optimized in order to retrieve as many true SNPs, while keeping the rate of false positive as low as possible. Overall, the efficiency of detection of true SNPs was 83.1%. However, this rate varied largely as a function of the rare SNP allele frequency: down to 41% for rare SNP alleles (frequency < 10%), up to 98% for allele frequencies above 10%. Third, the detection method was applied to the 18498 assembled maritime pine (Pinus pinaster Ait.) ESTs, allowing to identify a total of 1400 candidate SNPs, in contigs containing between 4 and 20 sequence reads. These genetic resources, described for the first time in a forest tree species, were made available at http://www.pierroton.inra/genetics/Pinesnps. We also derived an analytical expression for the SNP detection probability as a function of the SNP allele frequency, the number of haploid genomes used to generate the EST sequence database, and the sample size of the contigs considered for SNP detection. The frequency of the SNP allele was shown to be the main factor influencing the probability of SNP detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.