Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
<p><b>Genome-wide association analyses (GWAS) studies based on frequentist statistics have often proven ineffective in deriving biological insights from sequencing data. These GWAS lack the machinery to safeguard against technical noise inherent to high throughput sequencing platforms and are not conceptually designed for processing large sets of high-dimensional genomic data. However, such shortcomings are not peculiar to GWAS and have been studied in other fields of science, such as signal processing and computer science, for a long time. In particular, machine learning techniques, especially deep learning models, have proven highly successful in dealing with noisy high-dimensional data. Recently it has been shown that these techniques can be effective for handling genomic data even when directly transferred from modern computer vision and natural language processing applications. </b></p> <p>This thesis builds off the existing suites of such methodologies and presents a robust computational pipeline to functionally annotate whole-genome sequencing data. Moreover, it discusses and presents a data solution to efficiently process the large, heterogeneous datasets required for such analyses. The main objective of this thesis is to put forward a solution to identify variants that modify disease-causing mutations of complex heritable diseases. This is not a trivial problem given that the current gold standard approach, GWAS methodology, suffers not only from the drawbacks just described but is also underpowered by multiple testing (not useful for rare diseases) and fails to account for the epistatic nature of genetic interactions responsible for the onset and manifestation of complex diseases.</p> <p>Here, a set of cell-specific Gene Regulatory Networks (GRNs) inferred from dynamic genomic data was constructed. Most attempts to construct GRNs delineating such complex interactions relied on combining non-standardized high-throughput static datasets that contained false positive interactions and missing data points without insights into cell developmental states. To illuminate these intricate dynamic regulatory interconnections of the genome, specific to a tissue or a cell type, the Non-Stiff Dynamic Invertible Model of CO-Regulatory Networks (NS-DIMCORN) that allows unrestricted neural network architectures (to accommodate arbitrary depth increase for larger sets of genes) and training without partitioning the data dimensions was developed. NS-DIMCORN was trained on not-homogenized bulk tissue-specific RNA-seq and single-cell RNA-seq as a surrogate for cells’ continuous developmental states and modeled these highly dynamic systems with a set of ordinary differential equations. NS-DIMCORN yielded a continuous-time invertible generative model with unbiased density estimation only from RNA-seq read-count data and allowed time-flexible sampling of each gene’s expression level for ab initioassembly of genes regulatory network of specific cells.</p> <p>Secondly, Precise Graph-based Genome-Wide Annotation Sofware (PG-GWAS) was developed. For this purpose, embedding was used to map genomic variables to a vector of continuous numbers. Thus, each genomic variant was assigned a unique contextualized score that encoded the likelihood of effects on its respective gene products. These scores were pan-genomic by constructing a k-mer representation of all the haplotypes, independent of any “reference genome,” and were based only on each variant’s evolutionary constraints. Next, a graph representation of individuals’ genomes was constructed that integrated genomic variation scores, tissue-specific gene-gene interaction, and regulatory networks (assembled from GRNs) to allow the study of the genomic variants in aggregate and accounting for epistasis. Utilizing the Graph Attention mechanism identified these networks’ most critical interactions and allowed annotating the entire whole-genome graphs to determine the most prominent genomic features (i.e., groups of interacting genes) within each genome that could be responsible for different symptoms and onset in patients with the same disease-causing mutations. Eventually, to demonstrate the efficacy of this approach, PG-GWAS was tested on new sets of sequencing data, where the result improved in standard GWAS and provided insight into disease epistasis.</p>
<p><b>Genome-wide association analyses (GWAS) studies based on frequentist statistics have often proven ineffective in deriving biological insights from sequencing data. These GWAS lack the machinery to safeguard against technical noise inherent to high throughput sequencing platforms and are not conceptually designed for processing large sets of high-dimensional genomic data. However, such shortcomings are not peculiar to GWAS and have been studied in other fields of science, such as signal processing and computer science, for a long time. In particular, machine learning techniques, especially deep learning models, have proven highly successful in dealing with noisy high-dimensional data. Recently it has been shown that these techniques can be effective for handling genomic data even when directly transferred from modern computer vision and natural language processing applications. </b></p> <p>This thesis builds off the existing suites of such methodologies and presents a robust computational pipeline to functionally annotate whole-genome sequencing data. Moreover, it discusses and presents a data solution to efficiently process the large, heterogeneous datasets required for such analyses. The main objective of this thesis is to put forward a solution to identify variants that modify disease-causing mutations of complex heritable diseases. This is not a trivial problem given that the current gold standard approach, GWAS methodology, suffers not only from the drawbacks just described but is also underpowered by multiple testing (not useful for rare diseases) and fails to account for the epistatic nature of genetic interactions responsible for the onset and manifestation of complex diseases.</p> <p>Here, a set of cell-specific Gene Regulatory Networks (GRNs) inferred from dynamic genomic data was constructed. Most attempts to construct GRNs delineating such complex interactions relied on combining non-standardized high-throughput static datasets that contained false positive interactions and missing data points without insights into cell developmental states. To illuminate these intricate dynamic regulatory interconnections of the genome, specific to a tissue or a cell type, the Non-Stiff Dynamic Invertible Model of CO-Regulatory Networks (NS-DIMCORN) that allows unrestricted neural network architectures (to accommodate arbitrary depth increase for larger sets of genes) and training without partitioning the data dimensions was developed. NS-DIMCORN was trained on not-homogenized bulk tissue-specific RNA-seq and single-cell RNA-seq as a surrogate for cells’ continuous developmental states and modeled these highly dynamic systems with a set of ordinary differential equations. NS-DIMCORN yielded a continuous-time invertible generative model with unbiased density estimation only from RNA-seq read-count data and allowed time-flexible sampling of each gene’s expression level for ab initioassembly of genes regulatory network of specific cells.</p> <p>Secondly, Precise Graph-based Genome-Wide Annotation Sofware (PG-GWAS) was developed. For this purpose, embedding was used to map genomic variables to a vector of continuous numbers. Thus, each genomic variant was assigned a unique contextualized score that encoded the likelihood of effects on its respective gene products. These scores were pan-genomic by constructing a k-mer representation of all the haplotypes, independent of any “reference genome,” and were based only on each variant’s evolutionary constraints. Next, a graph representation of individuals’ genomes was constructed that integrated genomic variation scores, tissue-specific gene-gene interaction, and regulatory networks (assembled from GRNs) to allow the study of the genomic variants in aggregate and accounting for epistasis. Utilizing the Graph Attention mechanism identified these networks’ most critical interactions and allowed annotating the entire whole-genome graphs to determine the most prominent genomic features (i.e., groups of interacting genes) within each genome that could be responsible for different symptoms and onset in patients with the same disease-causing mutations. Eventually, to demonstrate the efficacy of this approach, PG-GWAS was tested on new sets of sequencing data, where the result improved in standard GWAS and provided insight into disease epistasis.</p>
Single-cell RNA-Seq (scRNA-seq) transcriptomics can elucidate gene regulatory networks (GRNs) of complex phenotypes, but raw sequencing observations only provide "snap-shots" of data and are inherently noisy. scRNA-seq trajectory inference has been utilized to solve for the missing observations, yet disentangling complex dynamics of gene-gene interactions at different time points from aggregated data is computationally expensive and not trivial. Here we developed the Non-Stiff Dynamic Invertible Model of CO-Regulatory Networks (NS-DIMCORN) to model the genetic nexus underpinning cellular functions using invertible warping of the flexible multivariate Gaussian distributions by Neural Ordinary Differential Equations. Our results yield a generative model with unbiased density estimation only from RNA-seq read-count data, which allows scaled time-flexible sampling of each gene's expression level for ab initio assembly of genes regulatory network of specific cells. We further demonstrated our proposed methodology is superior to the state-of-the-art algorithms in accurately recovering functional interactions in genome-wide GRNs, whether from synthetic or empirical data, and showed optimization of our algorithm and its GPU-based implementation further enhances the utility of our proposed technique.
Article Unsupervised Ship Detection in SAR Imagery Based on Energy Density-Induced Clustering Zifeng Yuan 1, Yu Li 1,*, Yu Liu 1, Jiale Liang 1, and Yuanzhi Zhang 2,3 1 Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2 School of Astronomy and Space Science, University of Chinese Academy of Sciences, Beijing 100049, China 3 Key Laboratory of Lunar and Deep Space Exploration, National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100101, China * Correspondence: yuli@bjut.edu.cn Received: 6 March 2023 Accepted: 24 April 2023 Published: 26 September 2023 Abstract: Intelligent recognition of maritime ship targets from synthetic aperture radar (SAR) imagery is a hot research issue. However, interferences such as the strong sea clutter, sidelobe, small ship size and weak backscattered signal continually affect the detection results. To address this problem, a novel unsupervised machine learning-based ship detection algorithm, named energy density-induced clustering (EDIC), is proposed in this paper. It is discovered that the singular values between ship targets and interference signals are significantly different in a local region because of their various concentration degrees of signal energy intensity. Accordingly, in this study, two novel energy density features are proposed based on the singular value decomposition in order to effectively highlight the ship targets and suppress the interference. The proposed novel energy density features have the advantage of clearly distinguishing ship targets from sea surfaces regardless of the effects of interferences. To test the performance of the proposed features, unsupervised K-means clustering is conducted for obtaining ship detection results. Compared with the classical and state-of-the-art SAR ship detectors, the proposed EDIC method generally yields the best performance in almost all tested sea sample areas with different kinds of interferences, in terms of both detection accuracy and processing efficiency. The proposed energy density-based feature extraction method also has great potential for supervised classification using neural networks, random forests, etc.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.