The transcriptome is a set of genes transcribed in a given tissue under specific conditions and can be characterized by a list of genes with their corresponding frequencies of transcription. Transcriptome changes can be measured by counting gene tags from mRNA libraries or by measuring light signals in DNA microarrays. In any case, it is difficult to completely comprehend the global changes that occur in the transcriptome, given that thousands of gene expression measurements are involved. We propose an approach to define and estimate the diversity and specialization of transcriptomes and gene specificity. We define transcriptome diversity as the Shannon entropy of its frequency distribution. Gene specificity is defined as the mutual information between the tissues and the corresponding transcript, allowing detection of either housekeeping or highly specific genes and clarifying the meaning of these concepts in the literature. Tissue specialization is measured by average gene specificity. We introduce the formulae using a simple example and show their application in two datasets of gene expression in human tissues. Visualization of the positions of transcriptomes in a system of diversity and specialization coordinates makes it possible to understand at a glance their interrelations, summarizing in a powerful way which transcriptomes are richer in diversity of expressed genes, or which are relatively more specialized. The framework presented enlightens the relation among transcriptomes, allowing a better understanding of their changes through the development of the organism or in response to environmental stimuli.biological complexity ͉ gene expression ͉ microarrays ͉ serial analysis of gene expression (SAGE) ͉ Shannon entropy
BackgroundMeiosis is a form of specialized cell division that marks the transition from diploid meiocyte to haploid gamete, and provides an opportunity for genetic reassortment through recombination. Experimental data indicates that, relative to their wild ancestors, cultivated sunflower varieties show a higher recombination rate during meiosis. To better understand the molecular basis for this difference, we compared gene expression in male sunflower meiocytes in prophase I isolated from a domesticated line, a wild relative, and a F1 hybrid of the two.ResultsOf the genes that showed differential expression between the wild and domesticated genotypes, 63.62 % could not be identified as protein-coding genes, and of these genes, 70.98 % passed stringent filters to be classified as long non-coding RNAs (lncRNAs). Compared to the sunflower somatic transcriptome, meiocytes express a higher proportion of lncRNAs, and the majority of genes with exclusive expression in meiocytes were lncRNAs. Around 40 % of the lncRNAs showed sequence similarity with small RNAs (sRNA), while 1.53 % were predicted to be sunflower natural antisense transcripts (NATs), and 9.18 % contained transposable elements (TE). We identified 6895 lncRNAs that are exclusively expressed in meiocytes, these lncRNAs appear to have higher conservation, a greater degree of differential expression, a higher proportion of sRNA similarity, and higher TE content relative to lncRNAs that are also expressed in the somatic transcriptome.ConclusionslncRNAs play important roles in plant meiosis and may participate in chromatin modification processes, although other regulatory functions cannot be excluded. lncRNAs could also be related to the different recombination rates seen for domesticated and wild sunflowers.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2776-1) contains supplementary material, which is available to authorized users.
The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.
Chili pepper (Capsicum spp.) is an important crop, as well as a model for fruit development studies and domestication. Here, we performed a time-course experiment to estimate standardized gene expression profiles with respect to fruit development for six domesticated and four wild chili pepper ancestors. We sampled the transcriptomes every 10 days from flowering to fruit maturity, and found that the mean standardized expression profiles for domesticated and wild accessions significantly differed. The mean standardized expression was higher and peaked earlier for domesticated vs. wild genotypes, particularly for genes involved in the cell cycle that ultimately control fruit size. We postulate that these gene expression changes are driven by selection pressures during domestication and show a robust network of cell cycle genes with a time shift in expression, which explains some of the differences between domesticated and wild phenotypes.
donor genome in the resulting generation. The approach of Hanson (1959), which is based on average donor Marker-based breeding can be useful to expedite introgression of chromosome lengths, ignores the presence of donor specific genetic material from a donor parent into the background of an elite variety, through backcrossing. A model is proposed to predict chromosome segments in places of the genome that are the probability of donor parent genetic material being present in non-adjacent to the gene to be introgressed. specific regions of the genome, and its proportion at the chromosome-The advent of DNA markers opens many possibilities specific or whole genome levels, as a result of marker-based introgresfor backcross-based introgression. For instance, with sion. Furthermore, formulas are provided to calculate the variance markers linked to specific quantitative trait loci (QTLs), of the predicted values. Two kinds of markers are considered: donor it is possible to introgress specific regions of the genome parent specific and recurrent parent specific. The first type serves to that confer desirable quantitative characteristics to an introgress the desired fraction of donor genome, and the second one elite variety (Tanksley et al., 1989; Paterson et al., 1991; to recover the recurrent parent background genome. In all cases, the Dudley, 1993). In tomato (Lycopersicon esculentum probabilities and genomic proportions are calculated on a genetic Mill.), lines have been created that contain QTLs from map basis. This model permits any localization of markers through the genome, but requires knowledge of their map positions and the the wild species Lycopersicon hirsutum Hub. & Bonpl.. map lengths of the chromosomes. It is robust to mapping functions, Such lines outperform the original elite variety in yield, and admits any one based on the assumption of coincidence being soluble solids content, and fruit color (Tanksley and equal to the kth power of twice the recombination fraction. Two McCouch, 1997). This result was accomplished by the widely used mapping functions gave fairly different predictions of ''advanced backcross QTL analysis,'' developed by global chromosome introgression. Monte Carlo simulations for several Tanksley and Nelson (1996), and marker-assisted seleccircumstances allowed the testing of the model, and no significant tion. DNA markers can be useful as well to select for statistical deviations from the theoretical predictions were found. The maximum similarity to the recipient line and minimum results indicate that the formulas presented herein can be useful for similarity to the donor line (Hillel et al., 1990). This planning and prediction in a backcross breeding program.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.