2023
DOI: 10.1186/s12859-023-05237-9
|View full text |Cite
|
Sign up to set email alerts
|

Study of the error correction capability of multiple sequence alignment algorithm (MAFFT) in DNA storage

Abstract: Synchronization (insertions–deletions) errors are still a major challenge for reliable information retrieval in DNA storage. Unlike traditional error correction codes (ECC) that add redundancy in the stored information, multiple sequence alignment (MSA) solves this problem by searching the conserved subsequences. In this paper, we conduct a comprehensive simulation study on the error correction capability of a typical MSA algorithm, MAFFT. Our results reveal that its capability exhibits a phase transition when… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 31 publications
(42 reference statements)
0
4
0
Order By: Relevance
“…Meiser et al ( 2020) have used a Reed-Solomon code for storing a full album of music in DNA [33]. Recently, Xie et al ( 2023) conducted an analysis showing the value of the sequencing depth for retrieving the right string of data [34]. Sufficiently deep sequencing allows the use of MSA (multiple sequence alignment) methods to establish a consensus sequence and correct errors that may appear on the DNA strands.…”
Section: New Storage Medium Old Problems and Solutionsmentioning
confidence: 99%
“…Meiser et al ( 2020) have used a Reed-Solomon code for storing a full album of music in DNA [33]. Recently, Xie et al ( 2023) conducted an analysis showing the value of the sequencing depth for retrieving the right string of data [34]. Sufficiently deep sequencing allows the use of MSA (multiple sequence alignment) methods to establish a consensus sequence and correct errors that may appear on the DNA strands.…”
Section: New Storage Medium Old Problems and Solutionsmentioning
confidence: 99%
“…A further approach is the exploitation of sequencing depth by using a form of majority voting for sequencing reads, which allows compensation for indels that occurred during sequencing. 14 The DNA data storage channel consists of multiple steps, the writing of data into DNA (synthesis), the amplification of the synthesis product using PCR, the storage process itself, and the reading of the DNA back into a digital format (sequencing). Each component, including the various options for a component (for example, different sequencing machines), exhibits unique error profiles and error patterns.…”
Section: Introductionmentioning
confidence: 99%
“…Synthetic DNA has now been proved to be a new potential storage medium for exponentially growing data. , The total amount of data stored in synthetic DNA has reached the GB level; various practical automated read/write technologies for DNA storage have been proposed. Unlike traditional electric/optical/magnetic storage media, DNA storage is characterized by a large amount of insertions, deletions, and substitutions (IDSs) due to highly error-prone DNA synthesis and sequencing processes . The difficulty of data recovery mainly comes from the synchronization problem because of random insertions and deletions.…”
Section: Introductionmentioning
confidence: 99%