Information, such as text printed on paper or images projected onto microfilm, can survive for over 500 years. However, the storage of digital information for time frames exceeding 50 years is challenging. Here we show that digital information can be stored on DNA and recovered without errors for considerably longer time frames. To allow for the perfect recovery of the information, we encapsulate the DNA in an inorganic matrix, and employ error-correcting codes to correct storage-related errors. Specifically, we translated 83 kB of information to 4991 DNA segments, each 158 nucleotides long, which were encapsulated in silica. Accelerated aging experiments were performed to measure DNA decay kinetics, which show that data can be archived on DNA for millennia under a wide range of conditions. The original information could be recovered error free, even after treating the DNA in silica at 70 8C for one week. This is thermally equivalent to storing information on DNA in central Europe for 2000 years.Prehistorical information put down by our ancestors in cave drawings, texts engraved in gold, and medieval texts are some of the strongest links with our past. An example is the Archimedes Palimpsest that originates from the tenth century. This contains the single known copy of "The Methods of Mechanical Theorems", and represents a cornerstone in the development of geometry and modern calculus. The book has survived more than 1000 years and in 1998 was valued at more than two million USD. In view of this valuation of information it may seem surprising that current efforts of guaranteeing longevity of digital information are scarce (e.g. MDisc, Syylex) and the storage half-life of information has dropped drastically since the transition from analog to digital storage systems.[1]Traditional storage technologies such as optical and magnetic devices are not reliable for long-term (> 50 years) data storage.[2] Furthermore, the development of reliable systems requires long-term testing, which is well above the current device-development timelines. DNA is the only datastorage medium for which real long-term data are available from archeology. Most recently, 300 000 year old mitochondrial DNA from bears and humans has been sequenced. [3] DNA has also previously been utilized as a coding language, for applications in forensics, [4] product tagging, [5] and DNA computing.[6] As a consequence, several approaches to store information on DNA have been proposed. [7] However, those approaches are not reliable as they cannot handle errors efficiently and do not suggest how to (physically) store the DNA to maintain its stability over time.To overcome these issues we combined an error-correcting information-encoding scheme tailored to DNA (Scheme 1) with a previously established chemical method for storing DNA in "synthetic fossils". The corresponding experiments show that only by the combination of the two concepts, could digital information be recovered from DNA stored at the Global Seed Vault (at À18 8C) after over 1 milli...
We demonstrate a compact and easy-to-build computational camera for single-shot 3D imaging. Our lensless system consists solely of a diffuser placed in front of a standard image sensor. Every point within the volumetric field-of-view projects a unique pseudorandom pattern of caustics on the sensor. By using a physical approximation and simple calibration scheme, we solve the large-scale inverse problem in a computationally efficient way. The caustic patterns enable compressed sensing, which exploits sparsity in the sample to solve for more 3D voxels than pixels on the 2D sensor. Our 3D voxel grid is chosen to match the experimentally measured two-point optical resolution across the field-of-view, resulting in 100 million voxels being reconstructed from a single 1.3 megapixel image. However, the effective resolution varies significantly with scene content. Because this effect is common to a wide range of computational cameras, we provide new theory for analyzing resolution in such systems.
The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple low-complexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points. In other words, the adjacency matrix is constructed from the nearest neighbors of each data point in spherical distance. A statistical performance analysis shows that the algorithm exhibits robustness to additive noise and succeeds even when the subspaces intersect. Specifically, our results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level. We furthermore prove that the algorithm succeeds even when the data points are incompletely observed with the number of missing entries allowed to be (up to a log-factor) linear in the ambient dimension. We also propose a simple scheme that provably detects outliers, and we present numerical results on real and synthetic data.
Owing to its longevity and enormous information density, DNA, the molecule encoding biological information, has emerged as a promising archival storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules that are stored in an unordered way, and can only be read by sampling from this DNA pool. Moreover, imperfections in writing (synthesis), reading (sequencing), storage, and handling of the DNA, in particular amplification via PCR, lead to a loss of DNA molecules and induce errors within the molecules. In order to design DNA storage systems, a qualitative and quantitative understanding of the errors and the loss of molecules is crucial. In this paper, we characterize those error probabilities by analyzing data from our own experiments as well as from experiments of two different groups. We find that errors within molecules are mainly due to synthesis and sequencing, while imperfections in handling and storage lead to a significant loss of sequences. The aim of our study is to help guide the design of future DNA data storage systems by providing a quantitative and qualitative understanding of the DNA data storage channel.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.