2014
DOI: 10.1007/s10618-014-0362-1
|View full text |Cite
|
Sign up to set email alerts
|

On measuring similarity for sequences of itemsets

Abstract: International audienceComputing the similarity between sequences is a very important challenge for many different data mining tasks. There is a plethora of similarity measures for sequences in the literature, most of them being designed for sequences of items. In this work, we study the problem of measuring the similarity between sequences of itemsets. We focus on the notion of common subsequences as a way to measure similarity between a pair of sequences composed of a list of itemsets. We present new combinat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0
5

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 23 publications
(20 reference statements)
0
17
0
5
Order By: Relevance
“…This model is called stochastic finite-state transducer. It has resulted being very useful for sequence problems, such as pattern recognition, segmentation, DNA alignment and sequence classifications [7, 11, 12]. …”
Section: Methodsmentioning
confidence: 99%
“…This model is called stochastic finite-state transducer. It has resulted being very useful for sequence problems, such as pattern recognition, segmentation, DNA alignment and sequence classifications [7, 11, 12]. …”
Section: Methodsmentioning
confidence: 99%
“…exactly same pathway) Symmetry: sim(scriptX,scriptY)=sim(scriptY,scriptX) Furthermore, we can observe that 0sim(scriptX,scriptY)1. Considerations on the similarity measure Recent works have been proposed to improve similarity measures over sequential data. 39–41 Among them, the work by Egho et al. is the most related to our similarity notion.…”
Section: Appendix 1 Pathways Similaritymentioning
confidence: 99%
“…On the other hand, in Egho et al., only the distinct subsequences are used in defining the similarity measure. 39 As a result, the information about the multiplicity of each subsequence is lost. Second, enumerating all the distinct subsequences commonly shared across two sequences is computationally intense.…”
Section: Appendix 1 Pathways Similaritymentioning
confidence: 99%
See 1 more Smart Citation
“…Different from existing methods that use users’ ratings on the common items, this paper utilizes users’ IS to analyze users’ unique preferences because IS carries more semantics than standalone ratings so that it can not only show people’s dynamic interests but also indicate their evolution patterns. To calculate similarities between users’ IS, we take into account the length of the longest common sub-IS and the count of all common sub-IS, which have been verified as effective in classification problems [ 33 ][ 39 ][ 40 ][ 41 ]. To achieve the above tasks, we provide some additional definitions as follows:…”
Section: Problem Statementmentioning
confidence: 99%