2001
DOI: 10.1016/s0166-6851(01)00388-7
|View full text |Cite
|
Sign up to set email alerts
|

Discovering patterns in Plasmodium falciparum genomic DNA

Abstract: A method has been developed for discovering patterns in DNA sequences. Loosely based on the well-known Lempel Ziv model for text compression, the model detects repeated sequences in DNA. The repeats can be forward or inverted, and they need not be exact. The method is particularly useful for detecting distantly related sequences, and for finding patterns in sequences of biased nucleotide composition, where spurious patterns are often observed because the bias leads to coincidental nucleotide matches. We show h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
19
0

Year Published

2007
2007
2021
2021

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 32 publications
(50 reference statements)
0
19
0
Order By: Relevance
“…As a statistical compressor, the expert model is able to produce the information content sequence from DNA or protein. This is important when we want to analyze areas of interest [21,9,10]. For example, figure 1 shows a graph of information content along the HUMHBB sequence.…”
Section: Sequencementioning
confidence: 99%
See 1 more Smart Citation
“…As a statistical compressor, the expert model is able to produce the information content sequence from DNA or protein. This is important when we want to analyze areas of interest [21,9,10]. For example, figure 1 shows a graph of information content along the HUMHBB sequence.…”
Section: Sequencementioning
confidence: 99%
“…Compression of biological sequences is useful, not primarily for managing the genome database, but for modelling and learning about sequences. Work by Stern et al [21] recognizes the importance of mutual compressibility for discovering patterns of interest from genomes. Chen et al [6] and Powell et al [19] show that compressibility is a good measurement of relatedness between sequences and can be effectively used in sequence alignment and evolutionary tree construction.…”
Section: Introductionmentioning
confidence: 99%
“…Since this pioneeristic work, several other papers have been produced where linguistic approaches have been applied to understand a wide variety of characteristics in genomes, from the identification of active genes to the large scale comparison [18], [22], [10].…”
Section: Introductionmentioning
confidence: 99%
“…Dictionary-based compression algorithms, like those of the Lempel-Ziv family have been already used in the past to have an automatic selector of over-represented words, in order to select repeats along a genomic sequence [22] [18] [16], or to classify coding/non-coding sequences on the basis of the compression factor or similar indexes [19].…”
Section: Introductionmentioning
confidence: 99%
“…The importance of common compressibility for identifying patterns of interest from genomes is recognized [2] and it is also established that compressibility is a well dimension of relatedness among sequences [3]. Many compression algorithms use the characteristics of DNA like point mutation [4] or reverse complement to achieve a god compression rate.…”
mentioning
confidence: 99%