a b s t r a c tThe Longest Common Subsequence (LCS) is a well studied problem, having a wide range of implementations. Its motivation is in comparing strings. It has long been of interest to devise a similar measure for comparing higher dimensional objects, and more complex structures. In this paper we study the Longest Common Substructure of two matrices and show that this problem is N P -hard. We also study the Longest Common Subforest problem for multiple trees including a constrained version, as well. We show N P -hardness for k > 2 unordered trees in the constrained LCS. We also give polynomial time algorithms for ordered trees and prove a lower bound for any decomposition strategy for k trees.
In this paper a deterministic algorithm for the length reduction problem is presented. This algorithm enables a new tool for performing fast convolution in sparse data. While the regular fast convolution of vectors V 1 , V 2 whose sizes are N 1 , N 2 respectively, takes O(N 1 log N 2 ) using FFT, the proposed algorithm performs the convolution in O(n 1 log 3 n 1 ), where n 1 is the number of non-zero values in V 1 . This algorithm assumes that V 1 is given in advance, and the V 2 is given in running time. This running time is achieved using a preprocessing phase on V 1 , which takes O(n 2 1 ) if N 1 is polynomial in n 1 , and O(n 4 1 ) if N 1 is exponential in n 1 (which is rarely the case in practical applications).This tool is used to obtain faster results for several well known problems, such as the dDimensional Point Set Matching and Searching in Music Archives.
Automatic word segmentation is a basic requirement for unsupervised learning in morphological analysis. In this paper, we formulate a novel recursive method for minimum description length (MDL) word segmentation, whose basic operation is resegmenting the corpus on a prefix (equivalently, a suffix). We derive a local expression for the change in description length under resegmentation, i.e., one which depends only on properties of the specific prefix (not on the rest of the corpus). Such a formulation permits use of a new and efficient algorithm for greedy morphological segmentation of the corpus in a recursive manner. In particular, our method does not restrict words to be segmented only once, into a stem+affix form, as do many extant techniques. Early results for English and Turkish corpora are promising.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.