BackgroundBulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology.ResultsSuccessfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology.ConclusionsTwo transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies.
Background Standard strategies to identify genomic regions involved in a specific trait variation are often limited by time and resource consuming genotyping methods. Other limiting pre-requisites are the phenotyping of large segregating populations or of diversity panels and the availability and quality of a closely related reference genome. To overcome these limitations, we designed efficient Comparative Subsequence Sets Analysis (CoSSA) workflows to identify haplotype specific SNPs linked to a trait of interest from Whole Genome Sequencing data. Results As a model, we used the resistance to Synchytrium endobioticum pathotypes 2, 6 and 18 that co-segregated in a tetraploid full sib population. Genomic DNA from both parents, pedigree genotypes, unrelated potato varieties lacking the wart resistance traits and pools of resistant and susceptible siblings were sequenced. Set algebra and depth filtering of subsequences ( k -mers) were used to delete unlinked and common SNPs and to enrich for SNPs from the haplotype(s) harboring the resistance gene(s). Using CoSSA, we identified a major and a minor effect locus. Upon comparison to the reference genome, it was inferred that the major resistance locus, referred to as Sen3 , was located on the north arm of chromosome 11 between 1,259,552 and 1,519,485 bp. Furthermore, we could anchor the unanchored superscaffold DMB734 from the potato reference genome to a synthenous interval. CoSSA was also successful in identifying Sen3 in a reference genome independent way thanks to the de novo assembly of paired end reads matching haplotype specific k -mers. The de novo assembly provided more R haplotype specific polymorphisms than the reference genome corresponding region. CoSSA also offers possibilities for pedigree analysis. The origin of Sen3 was traced back until Ora. Finally, the diagnostic power of the haplotype specific markers was shown using a panel of 56 tetraploid varieties. Conclusions CoSSA is an efficient, robust and versatile set of workflows for the genetic analysis of a trait of interest using WGS data. Because the WGS data are used without intermediate reads mapping, CoSSA does not require the use of a reference genome. This approach allowed the identification of Sen3 and the design of haplotype specific, diagnostic markers. Electronic supplementary material The online version of this article (10.1186/s13007-019-0445-5) contains supplementary material, which is available to authorized users.
BackgroundWheat gluten is important for the industrial quality of bread wheat (Triticum aestivum L.) and durum wheat (T. turgidum L.). Gluten proteins are also the source of immunogenic peptides that can trigger a T cell reaction in celiac disease (CD) patients, leading to inflammatory responses in the small intestine. Various peptides with three major T cell epitopes involved in CD are derived from alpha-gliadin fraction of gluten. Alpha-gliadins are encoded by a large multigene family and amino acid variation in the CD epitopes is known to influence the immunogenicity of individual gene family members. Current commercial methods of gluten detection are unable to distinguish between immunogenic and non-immunogenic CD epitope variants and thus to accurately quantify the overall CD epitope load of a given wheat variety. Such quantification is indispensable for correct selection of wheat varieties with low potential to cause CD.ResultsA 454 RNA-amplicon sequencing method was developed for alpha-gliadin transcripts encompassing the three major CD epitopes and their variants. The method was used to screen developing grains on plants of 61 different durum wheat cultivars and accessions. A dedicated sequence analysis pipeline returned a total of 304 unique alpha-gliadin transcripts, corresponding to a total of 171 ‘unique deduced protein fragments’ of alpha-gliadins. The numbers of these fragments obtained in each plant were used to calculate quantitative and quantitative differences between the CD epitopes expressed in the endosperm of these wheat plants. A few plants showed a lower fraction of CD epitope-encoding alpha-gliadin transcripts, but none were free of CD epitopes.ConclusionsThe dedicated 454 RNA-amplicon sequencing method enables 1) the grouping of wheat plants according to the genetic variation in alpha-gliadin transcripts, and 2) the screening for plants which are potentially less CD-immunogenic. The resulting alpha-gliadin sequence database will be useful as a reference in proteomics analysis regarding the immunogenic potential of mature wheat grains.
BackgroundCultivated chrysanthemum is an outcrossing hexaploid (2n = 6× = 54) with a disputed mode of inheritance. In this paper, we present a single nucleotide polymorphism (SNP) selection pipeline that was used to design an Affymetrix Axiom array with 183 k SNPs from RNA sequencing data (1). With this array, we genotyped four bi-parental populations (with sizes of 405, 53, 76 and 37 offspring plants respectively), and a cultivar panel of 63 genotypes. Further, we present a method for dosage scoring in hexaploids from signal intensities of the array based on mixture models (2) and validation of selection steps in the SNP selection pipeline (3). The resulting genotypic data is used to draw conclusions on the mode of inheritance in chrysanthemum (4), and to make an inference on allelic expression bias (5).ResultsWith use of the mixture model approach, we successfully called the dosage of 73,936 out of 183,130 SNPs (40.4%) that segregated in any of the bi-parental populations. To investigate the mode of inheritance, we analysed markers that segregated in the large bi-parental population (n = 405). Analysis of segregation of duplex x nulliplex SNPs resulted in evidence for genome-wide hexasomic inheritance. This evidence was substantiated by the absence of strong linkage between markers in repulsion, which indicated absence of full disomic inheritance. We present the success rate of SNP discovery out of RNA sequencing data as affected by different selection steps, among which SNP coverage over genotypes and use of different types of sequence read mapping software. Genomic dosage highly correlated with relative allele coverage from the RNA sequencing data, indicating that most alleles are expressed according to their genomic dosage.ConclusionsThe large population, genotyped with a very large number of markers, is a unique framework for extensive genetic analyses in hexaploid chrysanthemum. As starting point, we show conclusive evidence for genome-wide hexasomic inheritance.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-4003-0) contains supplementary material, which is available to authorized users.
The European Union Common Catalogue (EUCC) for potato contains over 1000 varieties. Each year member states add varieties to the list after they have undergone Distinctness, Uniformity and Stability (DUS) testing according to international guidelines. A rapid and robust method for variety identification to aid the management and maintenance of existing variety collections and for the screening of new candidate varieties would therefore be a highly useful tool for DUS testing stations. A database containing key morphological characteristics and microsatellite data was constructed for varieties on the 2006 list of the EUCC for potato. Rules for scoring SSR markers in different laboratories were established to allow a harmonized scoring of markers. Almost all varieties (99.5%) were shown to have unique molecular profiles and in pair wise comparisons 99.99% of all variety pairs could be distinguished. This clearly shows the versatility of the markers and database for identifying potato samples.
Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
QualitySNPng is a new software tool for the detection and interactive visualization of single-nucleotide polymorphisms (SNPs). It uses a haplotype-based strategy to identify reliable SNPs; it is optimized for the analysis of current RNA-seq data; but it can also be used on genomic DNA sequences derived from next-generation sequencing experiments. QualitySNPng does not require a sequenced reference genome and delivers reliable SNPs for di- as well as polyploid species. The tool features a user-friendly interface, multiple filtering options to handle typical sequencing errors, support for SAM and ACE files and interactive visualization. QualitySNPng produces high-quality SNP information that can be used directly in genotyping by sequencing approaches for application in QTL and genome-wide association mapping as well as to populate SNP arrays. The software can be used as a stand-alone application with a graphical user interface or as part of a pipeline system like Galaxy. Versions for Windows, Mac OS X and Linux, as well as the source code, are available from http://www.bioinformatics.nl/QualitySNPng.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.