LCSk: A refined similarity measure

Benson, Gary; Levy, Avivit; Maimoni, S.; Noifeld, D.; Shalom, B. Riva

doi:10.1016/j.tcs.2015.11.026

Cited by 12 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We showed that both the LCS k + problem and the op-LCS k + problem can be solved in O(mn) time. Our result on the LCS k + problem gives a better worst-case running time than previous algorithms [2,15], while the experimental results showed that the previous algorithms run faster than ours on average. Although the op-LCS k + problem looks much more challenging than the LCS k + , since the former cannot be solved by a simple dynamic programming due to the properties of order-isomorphisms, the proposed algorithm achieves the same time complexity as the one for the LCS k + .…”

Section: Resultsmentioning

confidence: 49%

“…gov/nuccore/U38845.1, with k = 1, 2, 3, 4, 5. The experimental results under the conditions (1), (2) and (3) The proposed algorithm in Section 3 runs faster than PŽŠ for small k or small alphabets. This is due to that PŽŠ strongly depends on the total number of matching k length substring pairs between input strings, and for small k or small alphabets there are many matching pairs.…”

Section: Resultsmentioning

confidence: 99%

“…We assume that all strings are over an alphabet Σ. The length of a string X = (X[1], X [2], · · · , X[n]) is denoted by |X| = n. A substring of X beginning at i and ending at j is denoted by X[i : j] = (X[i], X[i + 1], · · · , X[j − 1], X[j]). We denote X i, +l = X[i : i+l −1] and X j, −l = X[j −l +1 : j].…”

Section: Preliminariesmentioning

confidence: 99%

“…The reverse of a string X is denoted by X R , and the operator · denotes the concatenation. We simply denote a string X = (X[1], X [2], · · · , X[n]) as X = X[1]X[2] · · · X[n] when clear from the context.…”

Section: Preliminariesmentioning

confidence: 99%

See 3 more Smart Citations

SOFSEM 2017: Theory and Practice of Computer Science

Steffen¹,

Baier²,

Brand³

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least k length substrings. First, we show an O(mn) time algorithm for the problem which gives a better worst-case running time than existing algorithms, where m and n are lengths of the input strings. Furthermore, we mainly consider the LCS in at least k length order-isomorphic substrings problem. We show that the problem can also be solved in O(mn) worst-case time by an easy-to-implement algorithm. * The final publication is available at Springer via http://dx.

show abstract

Section: Resultsmentioning

confidence: 49%

Section: Resultsmentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

Section: Preliminariesmentioning

confidence: 99%

See 2 more Smart Citations

SOFSEM 2017: Theory and Practice of Computer Science

Steffen¹,

Baier²,

Brand³

et al. 2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Non-metric based similarity approach is an alternative solution to find the similarity indexes of MDSs specifically in presence of outliers. A dynamic programming based LCSS computation algorithm, specifically for the k-length substring problems, was presented in literature to address the aforementioned issue i.e., outliers sensitivity [26], [27]. Zhu et al [28] presented two different approaches to solve the LCSS problem with minimum possible time and space complexities iff n = m, where n and m represent sequence length.…”

Section: Literature Reviewmentioning

confidence: 99%

A Heuristic Approach for Finding Similarity Indexes of Multivariate Data Sets

et al. 2020

View full text Add to dashboard Cite

Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers-which is addressed with the development of both classical and non-metric based approaches. However, classical techniques are sensitive to outliers and most of the non-classical approaches are either problem/application specific or overlay complex. Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community. In this paper, a non-metric based similarity measure algorithm, for MDSs, is presented that solves the aforementioned issues, particularly, noise and computational time, successfully. This technique finds the similarity indexes of noisy MDSs, of both equal and variable sizes, through utilizing minimum possible resources i.e., space and time. Experiments were conducted with both benchmark and real time MDSs for evaluating the proposed algorithm's performance against its rival algorithms, which are traditional dynamic programming based and sequential similarity measure algorithms. Experimental results show that the proposed scheme performs exceptionally well, in terms of time and space, than its counterpart algorithms and effectively tolerates a considerable portion of noisy data. INDEX TERMS Similarity index, multivariate data set, outliers, the longest common subsequence. I. INTRODUCTION Recent technological advancements, particularly in sensors and actuators, lead to the generation of enormous multivariate data sets (MDSs) in different application areas i.e., wireless sensor networks, internet of things (IoT), scientific experiments, industrial control processes, educational purpose testbeds, web and databases [1]. An MDS is defined as a set of related numbers or values associated with a specific entity in an organization. In other words, a group of univariate data sets in columns form is known as MDS [2]. Mathematically, it is represented as a matrix X m , n , where m and n corresponds to the rows and columns respectively. These MDSs are thor-The associate editor coordinating the review of this manuscript and approving it for publication was Chongsheng Zhang. oughly examined, using various classical and non-classical approaches, to discover valuable information that is used to determine the correlating or distinguishing factor of entities. One of the major issue, closely linked with MDS, is to find their similarity indexes in the presence of noise/outliers that is not possible with existing techniques. Generally, two MDSs, X i , j and Y m , n , are believed similar if most of their elements are highly correlated [3]. MDSs similarity problem is an active research area, both in computer science and mathematics, that is due to its existence in different real world application environments i.e., DNA analysis, sensors-based real...

show abstract

Exploiting Pseudo-locality of Interchange Distance

Levy

2021

String Processing and Information Retrieval

View full text Add to dashboard Cite

LCSk: A refined similarity measure

Cited by 12 publications

References 16 publications

SOFSEM 2017: Theory and Practice of Computer Science

SOFSEM 2017: Theory and Practice of Computer Science

A Heuristic Approach for Finding Similarity Indexes of Multivariate Data Sets

Exploiting Pseudo-locality of Interchange Distance

Contact Info

Product

Resources

About