SUMMARY
Molecular encoding in sequence-defined polymers shows promise as a new paradigm for data storage. Here, we report what is, to our knowledge, the first use of self-immolative oligourethanes for storing and reading encoded information. As a proof of principle, we describe how a text passage from Jane Austen’s
Mansfield Park
was encoded in sequence-defined oligourethanes and reconstructed via self-immolative sequencing. We develop Mol.E-coder, a software tool that uses a Huffman encoding scheme to convert the character table to hexadecimal. The oligourethanes are then generated by a high-throughput parallel synthesis. Sequencing of the oligourethanes by self-immolation is done concurrently in a parallel fashion, and the liquid chromatography-mass spectrometry (LC-MS) information decoded by our Mol.E-decoder software. The passage is capable of being reproduced wholly intact by a third-party, without any purifications or the use of tandem MS (MS/MS), despite multiple rounds of compression, encoding, and synthesis.
The DNA polymerase I from Geobacillus stearothermophilus (also known as Bst DNAP) is widely used in isothermal amplification reactions, where its strand displacement ability is prized. More robust versions of this enzyme should be enabled for diagnostic applications, especially for carrying out higher temperature reactions that might proceed more quickly. To this end, we appended a short fusion domain from the actin-binding protein villin that improved both stability and purification of the enzyme. In parallel, we have developed a machine learning algorithm that assesses the relative fit of individual amino acids to their chemical microenvironments at any position in a protein and applied this algorithm to predict sequence substitutions in Bst DNAP. The top predicted variants had greatly improved thermotolerance (heating prior to assay), and upon combination, the mutations showed additive thermostability, with denaturation temperatures up to 2.5 °C higher than the parental enzyme. The increased thermostability of the enzyme allowed faster loop-mediated isothermal amplification assays to be carried out at 73 °C, where both Bst DNAP and its improved commercial counterpart Bst 2.0 are inactivated. Overall, this is one of the first examples of the application of machine learning approaches to the thermostabilization of an enzyme.
Molecular encoding in abiotic sequence-defined polymers (SDPs) has recently emerged as a versatile platform for information and data storage. However, the storage capacity of these sequence-defined polymers remains underwhelming compared to that of the information storing biopolymer DNA. In an effort to increase their information storage capacity, herein we describe the synthesis and simultaneous sequencing of eight sequence-defined 10-mer oligourethanes. Importantly, we demonstrate the use of different isotope labels, such as halogen tags, as a tool to deconvolute the complex sequence information found within a heterogeneous mixture of at least 96 unique molecules, with as little as four micromoles of total material. In doing so, relatively high-capacity data storage was achieved: 256 bits in this example, the most information stored in a single sample of abiotic SDPs without the use of long strands. Within the sequence information, a 256-bit cipher key was stored and retrieved. The key was used to encrypt and decrypt a plain text document containing The Wonderf ul Wizard of Oz. To validate this platform as a medium of molecular steganography and cryptography, the cipher key was hidden in the ink of a personal letter, mailed to a third party, extracted, sequenced, and deciphered successfully in the first try, thereby revealing the encrypted document.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.