The human genome encodes for over 1800 microRNAs (miRNAs), which are short non-coding RNA molecules that function to regulate gene expression post-transcriptionally. Due to the potential for one miRNA to target multiple gene transcripts, miRNAs are recognized as a major mechanism to regulate gene expression and mRNA translation. Computational prediction of miRNA targets is a critical initial step in identifying miRNA:mRNA target interactions for experimental validation. The available tools for miRNA target prediction encompass a range of different computational approaches, from the modeling of physical interactions to the incorporation of machine learning. This review provides an overview of the major computational approaches to miRNA target prediction. Our discussion highlights three tools for their ease of use, reliance on relatively updated versions of miRBase, and range of capabilities, and these are DIANA-microT-CDS, miRanda-mirSVR, and TargetScan. In comparison across all miRNA target prediction tools, four main aspects of the miRNA:mRNA target interaction emerge as common features on which most target prediction is based: seed match, conservation, free energy, and site accessibility. This review explains these features and identifies how they are incorporated into currently available target prediction tools. MiRNA target prediction is a dynamic field with increasing attention on development of new analysis tools. This review attempts to provide a comprehensive assessment of these tools in a manner that is accessible across disciplines. Understanding the basis of these prediction methodologies will aid in user selection of the appropriate tools and interpretation of the tool output.
The genetic base of soybean [Glycine max (L.) Merr.] breeding in North America is very limited. The focus of this research was to assess the diversity of 18 soybean ancestors and 17 selected plant introductions (PIs) maintained in the USDA Soybean Germplasm Collection. Estimates of genetic relationships among the 35 genotypes were calculated from 281 random amplified polymorphic DNA (RAPD) markers using the simple matching coefficient (SMC) expressed as Euclidean distances. Two forms of hierarchical and nonhierarchical duster analysis as well as multidimensional scaling (MDS) were employed to reveal associations among the genotypes. The average genetic distance among all genotypes was 0.56. All methods of cluster analysis identified distinct groups of ancestors or PIs. Grouping of the ancestors generally agreed with known pedigree, origin, and maturity data. The four methods of clustering produced similar results, and genotypes were assigned to the same cluster 87% of the time. The MDS plots displayed relationships among the genotypes and may be a useful method of selecting genetically distinct individuals. The genotypes within the distinct PI dusters may possess useful genetic diversity that could be exploited by soybean breeders to increase yield.
Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language.
Introgression of diverse germplasm into the current soybean [Glycine max (L.) Merr.J genetic base may increase genetic variability and lead to greater gains from selection. The objective of this research was to evaluate the genetic diversity and agronomic performance of experimental lines derived from plant introductions (Pis) maintained in the USDA Soybean Germplasm Collection. These Pis are known to be genetically distinct from the ancestors of the modern North American soybean cultivars. Experimental lines containing 25 to 100% PI germplasm (based on pedigrees), their parents, and recently released public cultivars were evaluated for yield in seven environments in 1994 and 1995. Data from random amplified polymorphic DNA (RAPD) fragments were collected and genetic relationships among all genotypes were estimated using hierarchical and nonhierarchical cluster analyses. Experimental lines were identified that yielded significantly more than their domestic parent. Comparisons of pairwise distances revealed that many of the high‐yielding experimental lines were more diverse than their domestic parents from the nonparental cultivars. The increased genetic diversity and yield provide evidence that exotic germplasm can contribute genes for high yield.
A large number of random primers are available to generate random amplified polymorphic DNA (RAPD) markers; however, not all primers are equally informative. The focus of this research was to identify a small set of RAPD primers that can adequately describe the relationships among major North American soybean [Glycine max (L.) Merr.l ancestors and selected plant introductions (Pls). Two hundred eighty‐one polymorphic RAPD fragments evaluated on 35 ancestors and Pls were screened for reproducibility and levels of diversity. Principal‐components analysis (PCA) was employed identify RAPD fragments associated with the largest sources of variation. Hierarchical and nonhierarchical cluster analyses were used to depict the relationships among the 35 genotypes. One hundred twenty RAPD fragments from 64 random primers were highly reproducible and had polymorphism information content (PIC) scores ≥0.30. Principal‐ components analysis revealed that eight components explained 60% of the total variation. Stepwise removal of fragments from individual primers revealed that fragments from only 35 primers were critical to the analysis. The product‐moment correlation of pairwise distances estimated from the complete RAPD fragment data set and this core data set was 0.86 (P < 0.0001). Results from cluster analysis confirmed that this set of 35 primers provides an accurate measurement of the relationships among the 35 genotypes. These primers will be useful for estimating relationships between exotic accessions and the current North American genetic base.
Background When designing an epigenome-wide association study (EWAS) to investigate the relationship between DNA methylation (DNAm) and some exposure(s) or phenotype(s), it is critically important to assess the sample size needed to detect a hypothesized difference with adequate statistical power. However, the complex and nuanced nature of DNAm data makes direct assessment of statistical power challenging. To circumvent these challenges and to address the outstanding need for a user-friendly interface for EWAS power evaluation, we have developed pwrEWAS. Results The current implementation of pwrEWAS accommodates power estimation for two-group comparisons of DNAm (e.g. case vs control, exposed vs non-exposed, etc.), where methylation assessment is carried out using the Illumina Human Methylation BeadChip technology. Power is calculated using a semi-parametric simulation-based approach in which DNAm data is randomly generated from beta-distributions using CpG-specific means and variances estimated from one of several different existing DNAm data sets, chosen to cover the most common tissue-types used in EWAS. In addition to specifying the tissue type to be used for DNAm profiling, users are required to specify the sample size, number of differentially methylated CpGs, effect size(s) (Δ β ), target false discovery rate (FDR) and the number of simulated data sets, and have the option of selecting from several different statistical methods to perform differential methylation analyses. pwrEWAS reports the marginal power, marginal type I error rate, marginal FDR, and false discovery cost (FDC). Here, we demonstrate how pwrEWAS can be applied in practice using a hypothetical EWAS. In addition, we report its computational efficiency across a variety of user settings. Conclusion Both under- and overpowered studies unnecessarily deplete resources and even risk failure of a study. With pwrEWAS, we provide a user-friendly tool to help researchers circumvent these risks and to assist in the design and planning of EWAS. Availability The web interface is written in the R statistical programming language using Shiny (RStudio Inc., 2016) and is available at https://biostats-shinyr.kumc.edu/pwrEWAS/ . The R package for pwrEWAS is publicly available at GitHub ( https://github.com/stefangraw/pwrEWAS ). Electronic supplementary material The online version of this article (10.1186/s12859-019-2804-7) contains supplementary material, which is available to authorized users.
O-linked N-acetylglucosamine, better known as O-GlcNAc, is a sugar post-translational modification participating in a diverse range of cell functions. Disruptions in the cycling of O-GlcNAc mediated by O-GlcNAc transferase (OGT) and O-GlcNAcase (OGA), respectively, is a driving force for aberrant cell signaling in disease pathologies, such as diabetes, obesity, Alzheimer's disease, and cancer. Production of UDP-GlcNAc, the metabolic substrate for OGT, by the Hexosamine Biosynthetic Pathway (HBP) is controlled by the input of amino acids, fats, and nucleic acids, making O-GlcNAc a key nutrient-sensor for fluctuations in these macromolecules. The mammalian target of rapamycin (mTOR) and AMP-activated protein kinase (AMPK) pathways also participate in nutrient-sensing as a means of controlling cell activity and are significant factors in a variety of pathologies. Research into the individual nutrient-sensitivities of the HBP, AMPK, and mTOR pathways has revealed a complex regulatory dynamic, where their unique responses to macromolecule levels coordinate cell behavior. Importantly, cross-talk between these pathways fine-tunes the cellular response to nutrients. Strong evidence demonstrates that AMPK negatively regulates the mTOR pathway, but O-GlcNAcylation of AMPK lowers enzymatic activity and promotes growth. On the other hand, AMPK can phosphorylate OGT leading to changes in OGT function. Complex sets of interactions between the HBP, AMPK, and mTOR pathways integrate nutritional signals to respond to changes in the environment. In particular, examining these relationships using systems biology approaches might prove a useful method of exploring the complex nature of cell signaling. Overall, understanding the complex interactions of these nutrient pathways will provide novel mechanistic information into how nutrients influence health and disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.