The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.biological complexity ͉ gene expression ͉ microarrays ͉ serial analysis of gene expression (SAGE) ͉ Shannon entropy
BackgroundMeiosis is a form of specialized cell division that marks the transition from diploid meiocyte to haploid gamete, and provides an opportunity for genetic reassortment through recombination. Experimental data indicates that, relative to their wild ancestors, cultivated sunflower varieties show a higher recombination rate during meiosis. To better understand the molecular basis for this difference, we compared gene expression in male sunflower meiocytes in prophase I isolated from a domesticated line, a wild relative, and a F1 hybrid of the two.ResultsOf the genes that showed differential expression between the wild and domesticated genotypes, 63.62 % could not be identified as protein-coding genes, and of these genes, 70.98 % passed stringent filters to be classified as long non-coding RNAs (lncRNAs). Compared to the sunflower somatic transcriptome, meiocytes express a higher proportion of lncRNAs, and the majority of genes with exclusive expression in meiocytes were lncRNAs. Around 40 % of the lncRNAs showed sequence similarity with small RNAs (sRNA), while 1.53 % were predicted to be sunflower natural antisense transcripts (NATs), and 9.18 % contained transposable elements (TE). We identified 6895 lncRNAs that are exclusively expressed in meiocytes, these lncRNAs appear to have higher conservation, a greater degree of differential expression, a higher proportion of sRNA similarity, and higher TE content relative to lncRNAs that are also expressed in the somatic transcriptome.ConclusionslncRNAs play important roles in plant meiosis and may participate in chromatin modification processes, although other regulatory functions cannot be excluded. lncRNAs could also be related to the different recombination rates seen for domesticated and wild sunflowers.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2776-1) contains supplementary material, which is available to authorized users.
The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.
donor genome in the resulting generation. The approach of Hanson (1959), which is based on average donor Marker-based breeding can be useful to expedite introgression of chromosome lengths, ignores the presence of donor specific genetic material from a donor parent into the background of an elite variety, through backcrossing. A model is proposed to predict chromosome segments in places of the genome that are the probability of donor parent genetic material being present in non-adjacent to the gene to be introgressed. specific regions of the genome, and its proportion at the chromosome-The advent of DNA markers opens many possibilities specific or whole genome levels, as a result of marker-based introgresfor backcross-based introgression. For instance, with sion. Furthermore, formulas are provided to calculate the variance markers linked to specific quantitative trait loci (QTLs), of the predicted values. Two kinds of markers are considered: donor it is possible to introgress specific regions of the genome parent specific and recurrent parent specific. The first type serves to that confer desirable quantitative characteristics to an introgress the desired fraction of donor genome, and the second one elite variety (Tanksley et al., 1989; Paterson et al., 1991; to recover the recurrent parent background genome. In all cases, the Dudley, 1993). In tomato (Lycopersicon esculentum probabilities and genomic proportions are calculated on a genetic Mill.), lines have been created that contain QTLs from map basis. This model permits any localization of markers through the genome, but requires knowledge of their map positions and the the wild species Lycopersicon hirsutum Hub. & Bonpl.. map lengths of the chromosomes. It is robust to mapping functions, Such lines outperform the original elite variety in yield, and admits any one based on the assumption of coincidence being soluble solids content, and fruit color (Tanksley and equal to the kth power of twice the recombination fraction. Two McCouch, 1997). This result was accomplished by the widely used mapping functions gave fairly different predictions of ''advanced backcross QTL analysis,'' developed by global chromosome introgression. Monte Carlo simulations for several Tanksley and Nelson (1996), and marker-assisted seleccircumstances allowed the testing of the model, and no significant tion. DNA markers can be useful as well to select for statistical deviations from the theoretical predictions were found. The maximum similarity to the recipient line and minimum results indicate that the formulas presented herein can be useful for similarity to the donor line (Hillel et al., 1990). This planning and prediction in a backcross breeding program.
Chili pepper (Capsicum spp.) is an important crop, as well as a model for fruit development studies and domestication. Here, we performed a time-course experiment to estimate standardized gene expression profiles with respect to fruit development for six domesticated and four wild chili pepper ancestors. We sampled the transcriptomes every 10 days from flowering to fruit maturity, and found that the mean standardized expression profiles for domesticated and wild accessions significantly differed. The mean standardized expression was higher and peaked earlier for domesticated vs. wild genotypes, particularly for genes involved in the cell cycle that ultimately control fruit size. We postulate that these gene expression changes are driven by selection pressures during domestication and show a robust network of cell cycle genes with a time shift in expression, which explains some of the differences between domesticated and wild phenotypes.
Germplasm banks are growing in their importance, number of accessions and amount of characterization data, with a large emphasis on molecular genetic markers. In this work, we offer an integrated view of accessions and marker data in an information theory framework. The basis of this development is the mutual information between accessions and allele frequencies for molecular marker loci, which can be decomposed in allele specificities, as well as in rarity and divergence of accessions. In this way, formulas are provided to calculate the specificity of the different marker alleles with reference to their distribution across accessions, accession rarity, defined as the weighted average of the specificity of its alleles, and divergence, defined by the Kullback-Leibler formula. Albeit being different measures, it is demonstrated that average rarity and divergence are equal for any collection. These parameters can contribute to the knowledge of the structure of a germplasm collection and to make decisions about the preservation of rare variants. The concepts herein developed served as the basis for a strategy for core subset selection called HCore, implemented in a publicly available R script. As a proof of concept, the mathematical view and tools developed in this research were applied to a large collection of Mexican wheat accessions, widely characterized by SNP markers. The most specific alleles were found to be private of a single accession, and the distribution of this parameter had its highest frequencies at low levels of specificity. Accession rarity and divergence had largely symmetrical distributions, and had a positive, albeit non-strictly linear relationship. Comparison of the HCore approach for core subset selection, with three state-of-the-art methods, showed it to be superior for average divergence and rarity, mean genetic distance and diversity. The proposed approach can be used for knowledge extraction and decision making in germplasm collections of diploid, inbred or outbred species.
RNA-Seq experiments allow genome-wide estimation of relative gene expression. Estimation of gene expression at different time points generates time expression profiles of phenomena of interest, as for example fruit development. However, such profiles can be complex to analyze and interpret. We developed a methodology that transforms original RNA-Seq data from time course experiments into standardized expression profiles, which can be easily interpreted and analyzed. To exemplify this methodology we used RNA-Seq data obtained from 12 accessions of chili pepper (Capsicum annuum L.) during fruit development. All relevant data, as well as functions to perform analyses and interpretations from this experiment, were gathered into a publicly available R package: “Salsa”. Here we explain the rational of the methodology and exemplify the use of the package to obtain valuable insights into the multidimensional time expression changes that occur during chili pepper fruit development. We hope that this tool will be of interest for researchers studying fruit development in chili pepper as well as in other angiosperms.
The tomatillo, Physalis ixocarpa Brot. (2n = 2x = 24), is an important crop in Mexico, and it is becoming appreciated in other countries. Polyploidy induction is expected to increase its breeding potential. The objective of this work was to develop and characterize tomatillo autotetraploids through colchicine-based induction. Young seedlings of the Rendidora cultivar were treated for 24 h with colchicine in concentrations ranging from 0.04% to 0.20%, and ploidy levels were tested by cytological and flow cytometry techniques. Autotetraploidy was induced with colchicine concentrations of 0.12% and 0.16%, with success rates of 67% and 65%, respectively. Presence of univalents, bivalents and multivalents was observed in prophase I and metaphase I. The basic genome size was not altered in the third generation progeny from treated plants. Autotetraploid plants were fertile and productive, but their pollen development was lower than their diploid counterpart. The polyploid plants showed higher values for life cycle length, plant height, fruit weight and equatorial diameter, fruits per plant, and soluble solid concentration. This is the first report of an autopolyploid cultivated tomatillo. Its genome duplication is readily induced with production of fertile plants, and may be valuable to introduce genetic plasticity in this crop.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.