2020
DOI: 10.1038/s41467-020-16958-3
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying molecular bias in DNA data storage

Abstract: DNA has recently emerged as an attractive medium for archival data storage. Recent work has demonstrated proof-of-principle prototype systems; however, very uneven (biased) sequencing coverage has been reported, which indicates inefficiencies in the storage process. Deviations from the average coverage in the sequence copy distribution can either cause wasteful provisioning in sequencing or excessive number of missing sequences. Here, we use millions of unique sequences from a DNA-based digital data archival s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
57
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 59 publications
(57 citation statements)
references
References 28 publications
0
57
0
Order By: Relevance
“…Indeed, it may increase the cost signi cantly to make a single copy DNA data since additional efforts are required. Efforts on reducing the "multiple copy" feature by serial dilutions observed massive strand dropouts when reducing the average copy number under ten, making reliable data retrieval impractical 17 . Additionally, DNA molecules, as highly degradable polymers, break easily.…”
Section: Introductionmentioning
confidence: 99%
“…Indeed, it may increase the cost signi cantly to make a single copy DNA data since additional efforts are required. Efforts on reducing the "multiple copy" feature by serial dilutions observed massive strand dropouts when reducing the average copy number under ten, making reliable data retrieval impractical 17 . Additionally, DNA molecules, as highly degradable polymers, break easily.…”
Section: Introductionmentioning
confidence: 99%
“…This variation is possibly a result of using low quantity input material which in this case was the highly diluted cDNA. Previous studies have discussed HTS data variation to be mainly associated with PCR stochasticity, primer and library preparation biases, phasing and prephasing during the sequencing process [43][44][45][46]. In this study, and particularly for the tripartite genome of CMV, the variability of RNA copy numbers within the virus genome, as reported by [47] might explain why there are more RNA1 reads than RNA3 reads generated by TG-Seq.…”
Section: Discussionmentioning
confidence: 66%
“…This limited sensitivity of gel electrophoresis to detect low concentration [35,41] and/or the reported "optical error" associated with gel electrophoresis especially when visualizing very low amplified products [13] might have caused some viruses amplicons not to be picked by gel electrophoresis. The TG-Seq approach sequences the multiple amplicons generated in an mPCR reaction overcoming these limitations and the viral sequences generated from these multiple amplicons provide a further immediate homology confirmation of the present targets Although high fidelity DNA polymerase was used in amplifying low concentrated nucleic material (serial diluted library), there was variation in the amplicon datasets (number of reads mapping to the virus specific amplicon), as commonly observed in shotgun HTS [42][43][44][45][46]. This variation is possibly a result of using low quantity input material which in this case was the highly diluted cDNA.…”
Section: Discussionmentioning
confidence: 99%
“…If the copy number of each form of DNA molecule is 2, then the probability of complete successful amplification for the 50-kb DNA molecule (~99.75%) will be much higher compared to a hundred 500-mer DNA molecules (~75.36%), which contain an equal quantity of binary information. In real experiments, the synthesized DNA molecules as a mixture follow a Poisson distribution (32), which makes the probability of successful amplification of all DNA molecules even lower. Therefore, we transcoded a portion of one of the text files (Shakespeare Sonnet.txt) into a 54,240-bp DNA fragment using YYC and evaluated its potential in data robustness and information density for the application of in vivo DNA data storage.…”
Section: Experimental Validation Of the Compatibility Of Yycmentioning
confidence: 99%