On measuring similarity for sequences of itemsets

Egho, Elias; Raïssi, Chedy; Calders, Toon; Jay, Nicolas; Napoli, Amedeo

doi:10.1007/s10618-014-0362-1

Cited by 19 publications

(22 citation statements)

References 23 publications

(20 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This model is called stochastic finite-state transducer. It has resulted being very useful for sequence problems, such as pattern recognition, segmentation, DNA alignment and sequence classifications [7, 11, 12]. …”

Section: Methodsmentioning

confidence: 99%

Learning stochastic finite-state transducer to predict individual patient outcomes

et al. 2016

View full text Add to dashboard Cite

The high frequency data in intensive care unit is flashed on a screen for a few seconds and never used again. However, this data can be used by machine learning and data mining techniques to predict patient outcomes. Learning finite-state transducers (FSTs) have been widely used in problems where sequences need to be manipulated and insertions, deletions and substitutions need to be modeled. In this paper, we learned the edit distance costs of a symbolic univariate time series representation through a stochastic finite-state transducer to predict patient outcomes in intensive care units. The Nearest-Neighbor method with these learned costs was used to classify the patient status within an hour after 10 h of data. Several experiments were developed to estimate the parameters that better fit the model regarding the prediction metrics. Our best results are compared with published works, where most of the metrics (i.e., Accuracy, Precision and F-measure) were improved.

show abstract

Section: Methodsmentioning

confidence: 99%

Learning stochastic finite-state transducer to predict individual patient outcomes

et al. 2016

View full text Add to dashboard Cite

show abstract

“…exactly same pathway) Symmetry: sim(scriptX,scriptY)=sim(scriptY,scriptX) Furthermore, we can observe that 0≤sim(scriptX,scriptY)≤1. Considerations on the similarity measure Recent works have been proposed to improve similarity measures over sequential data. 39–41 Among them, the work by Egho et al. is the most related to our similarity notion.…”

Section: Appendix 1 Pathways Similaritymentioning

confidence: 99%

“…On the other hand, in Egho et al., only the distinct subsequences are used in defining the similarity measure. 39 As a result, the information about the multiplicity of each subsequence is lost. Second, enumerating all the distinct subsequences commonly shared across two sequences is computationally intense.…”

Section: Appendix 1 Pathways Similaritymentioning

confidence: 99%

See 1 more Smart Citation

Linking temporal medical records using non-protected health information data

Bonomi

Jiang

2017

Stat Methods Med Res

View full text Add to dashboard Cite

Modern medical research relies on multi-institutional collaborations which enhance the knowledge discovery and data reuse. While these collaborations allow researchers to perform analytics otherwise impossible on individual datasets, they often pose significant challenges in the data integration process. Due to the lack of a unique identifier, data integration solutions often have to rely on patient’s protected health information (PHI). In many situations, such information cannot leave the institutions or must be strictly protected. Furthermore, the presence of noisy values for these attributes may result in poor overall utility. While much research has been done to address these challenges, most of the current solutions are designed for a static setting without considering the temporal information of the data (e.g. EHR). In this work, we propose a novel approach that uses non-PHI for linking patient longitudinal data. Specifically, our technique captures the diagnosis dependencies using patterns which are shown to provide important indications for linking patient records. Our solution can be used as a standalone technique to perform temporal record linkage using non-protected health information data or it can be combined with Privacy Preserving Record Linkage solutions (PPRL) when protected health information is available. In this case, our approach can solve ambiguities in results. Experimental evaluations on real datasets demonstrate the effectiveness of our technique.

show abstract

“…Different from existing methods that use users’ ratings on the common items, this paper utilizes users’ IS to analyze users’ unique preferences because IS carries more semantics than standalone ratings so that it can not only show people’s dynamic interests but also indicate their evolution patterns. To calculate similarities between users’ IS, we take into account the length of the longest common sub-IS and the count of all common sub-IS, which have been verified as effective in classification problems [ 33 ][ 39 ][ 40 ][ 41 ]. To achieve the above tasks, we provide some additional definitions as follows:…”

Section: Problem Statementmentioning

confidence: 99%

Collaborative Filtering Recommendation on Users’ Interest Sequences

et al. 2016

View full text Add to dashboard Cite

As an important factor for improving recommendations, time information has been introduced to model users’ dynamic preferences in many papers. However, the sequence of users’ behaviour is rarely studied in recommender systems. Due to the users’ unique behavior evolution patterns and personalized interest transitions among items, users’ similarity in sequential dimension should be introduced to further distinguish users’ preferences and interests. In this paper, we propose a new collaborative filtering recommendation method based on users’ interest sequences (IS) that rank users’ ratings or other online behaviors according to the timestamps when they occurred. This method extracts the semantics hidden in the interest sequences by the length of users’ longest common sub-IS (LCSIS) and the count of users’ total common sub-IS (ACSIS). Then, these semantics are utilized to obtain users’ IS-based similarities and, further, to refine the similarities acquired from traditional collaborative filtering approaches. With these updated similarities, transition characteristics and dynamic evolution patterns of users’ preferences are considered. Our new proposed method was compared with state-of-the-art time-aware collaborative filtering algorithms on datasets MovieLens, Flixster and Ciao. The experimental results validate that the proposed recommendation method is effective and outperforms several existing algorithms in the accuracy of rating prediction.

show abstract

On measuring similarity for sequences of itemsets

Cited by 19 publications

References 23 publications

Learning stochastic finite-state transducer to predict individual patient outcomes

Learning stochastic finite-state transducer to predict individual patient outcomes

Linking temporal medical records using non-protected health information data

Collaborative Filtering Recommendation on Users’ Interest Sequences

Contact Info

Product

Resources

About