David Ward scite author profile

As global demand for digital storage capacity grows, storage technologies based on synthetic DNA have emerged as a dense and durable alternative to traditional media. Existing approaches leverage robust error correcting codes and precise molecular mechanisms to reliably retrieve specific files from large databases. Typically, files are retrieved using a pre-specified key, analogous to a filename. However, these approaches lack the ability to perform more complex computations over the stored data, such as similarity search: e.g., finding images that look similar to an image of interest without prior knowledge of their file names. Here we demonstrate a technique for executing similarity search over a DNA-based database of 1.6 million images. Queries are implemented as hybridization probes, and a key step in our approach was to learn an image-to-sequence encoding ensuring that queries preferentially bind to targets representing visually similar images. Experimental results show that our molecular implementation performs comparably to state-of-the-art in silico algorithms for similarity search.

show abstract

A Content-Addressable DNA Database with Learned Sequence Encodings

Stewart

Chen

Ward

et al. 2018

View full text Add to dashboard Cite

DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access

et al. 2023

View full text Add to dashboard Cite

DNA has emerged as an attractive medium for archival data storage due to its durability and high information density. Scalable parallel random access to information is a desirable property of any storage system. For DNA-based storage systems, however, this still needs to be robustly established. Here we report on a thermoconfined polymerase chain reaction, which enables multiplexed, repeated random access to compartmentalized DNA files. The strategy is based on localizing biotin-functionalized oligonucleotides inside thermoresponsive, semipermeable microcapsules. At low temperatures, microcapsules are permeable to enzymes, primers and amplified products, whereas at high temperatures, membrane collapse prevents molecular crosstalk during amplification. Our data show that the platform outperforms non-compartmentalized DNA storage compared with repeated random access and reduces amplification bias tenfold during multiplex polymerase chain reaction. Using fluorescent sorting, we also demonstrate sample pooling and data retrieval by microcapsule barcoding. Therefore, the thermoresponsive microcapsule technology offers a scalable, sequence-agnostic approach for repeated random access to archival DNA files.

show abstract

Content-Based Similarity Search in Large-Scale DNA Data Storage Systems

Bee

Chen

Ward

et al. 2020

Preprint

View full text Add to dashboard Cite

Synthetic DNA has the potential to store the world’s continuously growing amount of data in an extremely dense and durable medium. Current proposals for DNA-based digital storage systems include the ability to retrieve individual files by their unique identifier, but not by their content. Here, we demonstrate content-based retrieval from a DNA database by learning a mapping from images to DNA sequences such that an encoded query image will retrieve visually similar images from the database via DNA hybridization. We encoded and synthesized a database of 1.6 million images and queried it with a variety of images, showing that each query retrieves a sample of the database containing visually similar images are retrieved at a rate much greater than chance. We compare our results with several algorithms for similarity search in electronic systems, and demonstrate that our molecular approach is competitive with state-of-the-art electronics.One Sentence SummaryLearned encodings enable content-based image similarity search from a database of 1.6 million images encoded in synthetic DNA.

show abstract

Combinatorial PCR Method for Efficient, Selective Oligo Retrieval from Complex Oligo Pools

et al. 2022

View full text Add to dashboard Cite

With the rapidly decreasing cost of array-based oligo synthesis, large-scale oligo pools offer significant benefits for advanced applications including gene synthesis, CRISPR-based gene editing, and DNA data storage. The selective retrieval of specific oligos from these complex pools traditionally uses polymerase chain reaction (PCR). Designing a large number of primers to use in PCR presents a serious challenge, particularly for DNA data storage, where the size of an oligo pool is orders of magnitude larger than other applications. Although a nested primer address system was recently developed to increase the number of accessible files for DNA storage, it requires more complicated lab protocols and more expensive reagents to achieve high specificity, as well as more DNA address space. Here, we present a new combinatorial PCR method that has none of those drawbacks and outperforms in retrieval specificity. In experiments, we accessed three files that each comprised 1% of a DNA prototype database that contained 81 different files and enriched them to over 99.9% using our combinatorial primer method. Our method provides a viable path for scaling up DNA data storage systems and has broader utility whenever one must access a specific target oligo and can design their own primer regions.

show abstract

DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access

Bögels

Nguyen

Ward

et al. 2023

Preprint

View full text Add to dashboard Cite

Owing to its longevity and extremely high information density, DNA has emerged as an attractive medium for archival data storage. Scalable parallel random access of information is a desirable property of any storage system. For DNA-based storage systems, however, this yet has to be robustly established. Here we develop thermoconfined PCR, a novel method that enables multiplexed, repeated random access of compartmentalized DNA files. Our strategy is based on stable localization of biotin-functionalized oligonucleotides inside microcapsules with temperature-dependent membrane permeability. At low temperatures, microcapsules are permeable to enzymes, primers, and amplified products, while at high temperatures membrane collapse prevents molecular crosstalk during amplification. We demonstrate that our platform outperforms non-compartmentalized DNA storage with respect to repeated random access and reducing amplification bias during multiplex PCR. Using fluorescent sorting, we additionally demonstrate sample pooling and data retrieval by barcoding of microcapsules. Our thermoresponsive microcapsule technology offers a scalable, sequence-agnostic approach for repeated random access of archival DNA files.

show abstract

Advances in Data Mining and Machine Learning for Chat Sentiment and Library Account Based-Recommendations

Hahn¹,

Ward²

2022

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David Ward

Scaling DNA data storage with nanoscale electrode wells

Molecular-level similarity search brings computing to DNA data storage

A Content-Addressable DNA Database with Learned Sequence Encodings

DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access

Content-Based Similarity Search in Large-Scale DNA Data Storage Systems

Combinatorial PCR Method for Efficient, Selective Oligo Retrieval from Complex Oligo Pools

DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access

Advances in Data Mining and Machine Learning for Chat Sentiment and Library Account Based-Recommendations

Contact Info

Product

Resources

About