BackgroundIn recent times, the application of deoxyribonucleic acid (DNA) has diversified with the emergence of fields such as DNA computing and DNA data embedding. DNA data embedding, also known as DNA watermarking or DNA steganography, aims to develop robust algorithms for encoding non-genetic information in DNA. Inherently DNA is a digital medium whereby the nucleotide bases act as digital symbols, a fact which underpins all bioinformatics techniques, and which also makes trivial information encoding using DNA straightforward. However, the situation is more complex in methods which aim at embedding information in the genomes of living organisms. DNA is susceptible to mutations, which act as a noisy channel from the point of view of information encoded using DNA. This means that the DNA data embedding field is closely related to digital communications. Moreover it is a particularly unique digital communications area, because important biological constraints must be observed by all methods. Many DNA data embedding algorithms have been presented to date, all of which operate in one of two regions: non-coding DNA (ncDNA) or protein-coding DNA (pcDNA).ResultsThis paper proposes two novel DNA data embedding algorithms jointly called BioCode, which operate in ncDNA and pcDNA, respectively, and which comply fully with stricter biological restrictions. Existing methods comply with some elementary biological constraints, such as preserving protein translation in pcDNA. However there exist further biological restrictions which no DNA data embedding methods to date account for. Observing these constraints is key to increasing the biocompatibility and in turn, the robustness of information encoded in DNA.ConclusionThe algorithms encode information in near optimal ways from a coding point of view, as we demonstrate by means of theoretical and empirical (in silico) analyses. Also, they are shown to encode information in a robust way, such that mutations have isolated effects. Furthermore, the preservation of codon statistics, while achieving a near-optimum embedding rate, implies that BioCode pcDNA is also a near-optimum first-order steganographic method.
A solution to the problem of asymptotically optimum perfect universal steganography of finite memoryless sources with a passive warden is provided, which is then extended to contemplate a distortion constraint. The solution rests on the fact that Slepian's Variant I permutation coding implements firstorder perfect universal steganography of finite host signals with optimum embedding rate. The duality between perfect universal steganography with asymptotically optimum embedding rate and lossless universal source coding with asymptotically optimum compression rate is evinced in practice by showing that permutation coding can be implemented by means of adaptive arithmetic coding. Next, a distortion constraint between the host signal and the information-carrying signal is considered. Such a constraint is essential whenever real-world host signals with memory (e.g., images, audio, or video) are decorrelated to conform to the memoryless assumption. The constrained version of the problem requires trading off embedding rate and distortion. Partitioned permutation coding is shown to be a practical way to implement this trade-off, performing close to an unattainable upper bound on the rate-distortion function of the problem.
PublisherThe The UCD community has made this article openly available. Please share how this access benefits you. Your story matters! (@ucd_oa) Some rights reserved. For more information, please see the item record link above.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.