Significance Self-assembling RNA molecules play critical roles throughout biology and bioengineering. To accelerate progress in RNA design, we present EteRNA, the first internet-scale citizen science “game” scored by high-throughput experiments. A community of 37,000 nonexperts leveraged continuous remote laboratory feedback to learn new design rules that substantially improve the experimental accuracy of RNA structure designs. These rules, distilled by machine learning into a new automated algorithm EteRNABot, also significantly outperform prior algorithms in a gauntlet of independent tests. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.
This paper studies the problem of estimating the covariance of a collection of vectors using only highly compressed measurements of each vector. An estimator based on back-projections of these compressive samples is proposed and analyzed. A distribution-free analysis shows that by observing just a single linear measurement of each vector, one can consistently estimate the covariance matrix, in both infinity and spectral norm, and this same analysis leads to precise rates of convergence in both norms. Via informationtheoretic techniques, lower bounds showing that this estimator is minimax-optimal for both infinity and spectral norm estimation problems are established. These results are also specialized to give matching upper and lower bounds for estimating the population covariance of a collection of Gaussian vectors, again in the compressive measurement model. The analysis conducted in this paper shows that the effective sample complexity for this problem is scaled by a factor of m 2 /d 2 where m is the compression dimension and d is the ambient dimension. Applications to subspace learning (Principal Components Analysis) and learning over distributed sensor networks are also discussed. *
No abstract
Semisupervised methods are techniques for using labeled data $(X_1,Y_1),\ldots,(X_n,Y_n)$ together with unlabeled data $X_{n+1},\ldots,X_N$ to make predictions. These methods invoke some assumptions that link the marginal distribution $P_X$ of X to the regression function f(x). For example, it is common to assume that f is very smooth over high density regions of $P_X$. Many of the methods are ad-hoc and have been shown to work in specific examples but are lacking a theoretical foundation. We provide a minimax framework for analyzing semisupervised methods. In particular, we study methods based on metrics that are sensitive to the distribution $P_X$. Our model includes a parameter $\alpha$ that controls the strength of the semisupervised assumption. We then use the data to adapt to $\alpha$.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1092 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.