Abstract.A simple DNA-based data storage scheme is demonstrated in which information is written using "addressing" oligonucleotides. In contrast to other methods that allow arbitrary code to be stored, the resulting DNA is suitable for downstream enzymatic and biological processing. This capability is crucial for DNA computers, and may allow for a diverse array of computational operations to be carried out using this DNA. Although here we use gel-based methods for information readout, we also propose more advanced methods involving protein/DNA complexes and atomic force microscopy/nano-pore schemes for data readout.
INTRODUCTION.DNA is attractive for storing digital information, partly due to its potentially ultra-high density 1 ; in theory, 1 gram of DNA is capable of storing about the same amount of data as 10 12 CD-ROMs. 2 DNA also offers the possibility of creating extremely durable information archives, e.g., by introducing the DNA into reproducing organisms, such as bacteria tolerant to radioactivity.3 As the organism replicates its genome, the information is carried into the next generation. In combination with some form of selection pressure to reduce mutation rates, such information could be secured for thousands, perhaps millions, of years. Under appropriate conditions, even in vitro storage of DNA could be secure for hundreds of years. Another key attraction of DNA as a memory storage device is novel forms of computational problems that can be addressed using it. For example, in his seminal paper, Adleman demonstrated that DNA could be used to solve an instance of the Hamiltonian path or "Traveling Salesman" problem. 4 Such problems require large amounts of conventional computing time, even for problems of modest complexity. By making use of DNA's ability to rapidly search a large information space in parallel, DNA computing offers the possibility of solving these problems on a practical timescale. The DNA memory scheme presented here might further enable DNA computation, by allowing the DNA computer to be "programmed" in a faster, more efficient manner. This could save enormous time and expense over synthesizing each unique molecule one by one.Previous groups have encoded meaningful information in DNA directly as a sequence of base pairs.3,5 However, chemical DNA synthesis is slow and expensive, and a new molecule must be created each time new data is to be written. To overcome this limitation, methods have been developed that take advantage of the formation of complementary base pairs in DNA to generate molecules representing information. Notably, a 3-bit system has been developed by Shin and Pierce that allows for the encoding of up to eight distinct states in a DNA molecule, which is even rewritable.6 This method has some important limitations, however, which may hinder its development into a fully-fledged DNA memory device. It was not possible to unambiguously read out the information using simple gel-based methods; in this case only the total number of memory bits that were "1" or "0" could be d...