Understanding the genetic regulatory code governing gene expression is an important challenge in molecular biology. However, how individual coding and non-coding regions of the gene regulatory structure interact and contribute to mRNA expression levels remains unclear. Here we apply deep learning on over 20,000 mRNA datasets to examine the genetic regulatory code controlling mRNA abundance in 7 model organisms ranging from bacteria to Human. In all organisms, we can predict mRNA abundance directly from DNA sequence, with up to 82% of the variation of transcript levels encoded in the gene regulatory structure. By searching for DNA regulatory motifs across the gene regulatory structure, we discover that motif interactions could explain the whole dynamic range of mRNA levels. Co-evolution across coding and non-coding regions suggests that it is not single motifs or regions, but the entire gene regulatory structure and specific combination of regulatory elements that define gene expression levels.
Understanding the genetic regulatory code that governs gene expression is a primary, yet challenging aspiration in molecular biology that opens up possibilities to cure human diseases and solve biotechnology problems. However, the fundamental question of how each of the individual coding and non-coding regions of the gene regulatory structure interact and contribute to the mRNA expression levels remains unanswered. Considering that all the information for gene expression regulation is already present in living cells, here we applied deep learning on over 20,000 mRNA datasets to learn the genetic regulatory code controlling mRNA expression in 7 model organisms ranging from bacteria to human.We show that in all organisms, mRNA abundance can be predicted directly from the DNA sequence with high accuracy, demonstrating that up to 82% of the variation of gene expression levels is encoded in the gene regulatory structure. Coding and non-coding regions carry both overlapping and orthogonal information and additively contribute to gene expression levels. By searching for DNA regulatory motifs present across the whole gene regulatory structure, we discover that motif interactions can regulate gene expression levels in a range of over three orders of magnitude. The uncovered co-evolution of coding and non-coding regions challenges the current paradigm that single motifs or regions are solely responsible for gene expression levels. Instead, we propose a holistic system that spans all regions of the gene structure and is required to analyse, understand, and design any future gene expression systems.
Theoretical models of the strong nuclear interaction contain unknown coupling constants (parameters) that must be determined using a pool of calibration data. In cases where the models are complex, leading to time consuming calculations, it is particularly challenging to systematically search the corresponding parameter domain for the best fit to the data. In this paper, we explore the prospect of applying Bayesian optimization to constrain the coupling constants in chiral effective field theory descriptions of the nuclear interaction. We find that Bayesian optimization performs rather well with low-dimensional parameter domains and foresee that it can be particularly useful for optimization of a smaller set of coupling constants. A specific example could be the determination of leading three-nucleon forces using data from finite nuclei or three-nucleon scattering experiments.
The digital transformation of manufacturing industries is expected to yield increased productivity. Companies collect large volumes of real-time machine data and are seeking new ways to use it in furthering data-driven decision making. A challenge for these companies is identifying throughput bottlenecks using the realtime machine data they collect. This paper proposes a data-driven algorithm to better identify bottleneck groups and provide diagnostic insights. The algorithm is based on the active period theory of throughput bottleneck analysis. It integrates available manufacturing execution systems (MES) data from the machines and tests the statistical significance of any bottlenecks detected. The algorithm can be automated to allow data-driven decision making on the shop floor, thus improving throughput. Real-world MES datasets were used to develop and test the algorithm, producing research outcomes useful to manufacturing industries. This research pushes standards in throughput bottleneck analysis, using an interdisciplinary approach based on production and data sciences.
Suppose that we are given a set of n elements d of which have a property called defective. A group test can check for any subset, called a pool, whether it contains a defective. It is known that a nearly optimal number of O(d log(n/d)) pools in two stages (where tests within a stage are done in parallel) are sufficient, but then the searcher must know d in advance. Here we explore group testing strategies that use a nearly optimal number of pools and a few stages although d is not known beforehand. We prove a lower bound of Ω(log d/ log log d) stages and more general pools versus stages tradeoff. This is almost tight, since O(log d) stages are sufficient for a strategy with O(d log n) pools. As opposed to this negative result, we devise a randomized strategy using O(d log(n/d)) pools in three stages, with any desired success probability 1 − . With some additional measures even two stages are enough. Open questions concern the optimal constant factors and practical implications. A related problem motivated by biological network analysis is to learn hidden vertex covers of a small size k in unknown graphs by edge group tests. (Does a given subset of vertices contain an edge?) We give a one-stage strategy using O(k 3 log n) pools, with any parameterized algorithm for vertex cover enumeration as a decoder. During the course of this work we also provide a classification of types of randomized search strategies in general. 84-95. 291 Discrete Math. Algorithm. Appl. 2010.02:291-311. Downloaded from www.worldscientific.com by UNIVERSITY OF AUCKLAND LIBRARY -SERIALS UNIT on 03/11/15. For personal use only. 292 P. Damaschke & A. S. Muhammad choose arbitrary subsets Q ⊂ X called pools, and ask whether Q contains at least one defective. Nondefective elements are called negative. A positive pool is a pool containing some defective, thus responding Yes to a group test. A negative pool is a pool without defectives, thus responding No to a group test.Group testing has several applications, most notably in biological and chemical testing, but also in communication networks, information gathering, compression, streaming algorithms, etc., see for instance [9,10,15,[20][21][22] and further pointers therein.Throughout this paper, log means log 2 if no other base is mentioned. By the information-theoretic lower bound, at least log n d ≈ d log(n/d) pools are needed to find d defectives even if the number d is known in advance, and it is an easy exercise to devise an adaptive query strategy using O(d log(n/d)) pools. Here, a strategy is called adaptive if queries are asked sequentially, that is, every pool can be prepared based on the outcomes of all earlier queries. For many applications however, the time consumption of adaptive strategies is hardly acceptable, and strategies that work in a few stages are strongly preferred: The pools for every stage must be prepared in advance, depending on the outcomes of earlier stages, and then they are queried in parallel.Any one-stage strategy needs Ω(d 2 log n/ log d) pools, as a consequence of ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.