2011
DOI: 10.1109/tkde.2011.69
|View full text |Cite
|
Sign up to set email alerts
|

Efficient and Accurate Discovery of Patterns in Sequence Data Sets

Abstract: Existing sequence mining algorithms mostly focus on mining for subsequences. However, a large class of applications, such as biological DNA and protein motif mining, require efficient mining of "approximate" patterns that are contiguous. The few existing algorithms that can be applied to find such contiguous approximate pattern mining have drawbacks like poor scalability, lack of guarantees in finding the pattern, and difficulty in adapting to other applications. In this paper, we present a new algorithm calle… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2014
2014
2018
2018

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 43 publications
(20 citation statements)
references
References 46 publications
0
20
0
Order By: Relevance
“…Let D be a function that measures similarity between two sequences. Following the previous work [8,9], in this paper we assume D is the Hamming distance (i.e., number of mismatches A maximal motif must be right maximal and left maximal [8]. m is right maximal if L(mα) has less occurrences or more mismatches than L(m), where α ∈ Σ.…”
Section: Motifsmentioning
confidence: 99%
See 4 more Smart Citations
“…Let D be a function that measures similarity between two sequences. Following the previous work [8,9], in this paper we assume D is the Hamming distance (i.e., number of mismatches A maximal motif must be right maximal and left maximal [8]. m is right maximal if L(mα) has less occurrences or more mismatches than L(m), where α ∈ Σ.…”
Section: Motifsmentioning
confidence: 99%
“…For example, node 1.2 in Fig. 3 is annotated with f = 2 because its path label TGC appears in S at (9,11) and (20,22). For clarity, we do not show the left-diversity annotation in the figure.…”
Section: Trie-based Search Space and Suffix Treesmentioning
confidence: 99%
See 3 more Smart Citations