Connecting genetic variation (genotype) to trait variation (phenotype) is a critical but often difficult step in genetic research. A genome-wide association study (GWAS) is a common approach to connect underlying genetic variation to complex phenotypic traits, allowing for phenotypic prediction. GWAS is important in many disciplines, including identifying genetic risk factors for common, complex diseases, identifying genes underlying important traits and predicting phenotypes from genotypes. GWAS is limited, though, in that the types of variations typically studied are single nucleotide polymorphisms (SNPs) identified relative to a single reference genome. These limitations lead to bias and preclude GWAS from studies across related species. The advent of next-generation sequencing has brought an exponential growth in DNA sequence data. This has led to the more comprehensive pangenomics approach, where the entire sequence content and variation of a population are succinctly represented independent of a reference. In prior work, we developed a method for identifying genomic regions that characterize complex variations within pangenomic data and showed that these regions provide a more general way to study genetic variation than existing approaches. This work describes our initial results to develop new methods for a new branch of genomic analysis called pangenome-wide association studies (PWAS) that generalizes GWAS to pangenomic datasets both within and across species. We make use of recently developed algorithms for fast compressed De Bruijn graph construction and identifying frequented regions in these graphs that can be used as machine-learning features to identify pangenomic regions, overlaid with gene annotations, that relate to complex phenotypic traits. Initial results on a pangenome composed of 100 yeast indicate that frequented region features provide better machine-learning regression models than SNPs for predicting phenotypic traits.
Background MicroRNAs (miRNAs) play a vital role as post-transcriptional regulators in gene expression. Experimental determination of miRNA sequence and structure is both expensive and time consuming. The next-generation sequencing revolution, which facilitated the rapid accumulation of biological data has brought biology into the “big data” domain. As such, developing computational methods to predict miRNAs has become an active area of inter-disciplinary research. Objective The objective of this systematic review is to focus on the developments of ab initio plant miRNA identification methods over the last decade. Data sources Five databases were searched for relevant articles, according to a well-defined review protocol. Study selection The search results were further filtered using the selection criteria that only included studies on novel plant miRNA identification using machine learning. Data extraction Relevant data from each study were extracted in order to carry out an analysis on their methodologies and findings. Results Results depict that in the last decade, there were 20 articles published on novel miRNA identification methods in plants of which only 11 of them were primarily focused on plant microRNA identification. Our findings suggest a need for more stringent plant-focused miRNA identification studies. Conclusion Overall, the study accuracies are of a satisfactory level, although they may generate a considerable number of false negatives. In future, attention must be paid to the biological plausibility of computationally identified miRNAs to prevent further propagation of biologically questionable miRNA sequences.
Connecting genetic variation (genotype) to trait variation (phenotype) is a critical but often difficult step in genetic research. A genome-wide association study (GWAS) is a common approach to connect underlying genetic variation to complex phenotypic traits, allowing for phenotypic prediction. GWAS is important in many disciplines, including identifying genetic risk factors for common, complex diseases, identifying genes underlying important traits and predicting phenotypes from genotypes. GWAS is lim-Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.