Semantic textual similarity (STS) is the task of assessing the degree of similarity between two texts in terms of meaning. Several approaches have been proposed in the literature to determine the semantic similarity between texts. The most promising work recently presented in the literature was supervised approaches. Unsupervised STS approaches are characterized by the fact that they do not require learning data, but they still suffer from some limitations. Word alignment has been widely used in the state-ofthe-art approaches. From this point, this paper has three contributions. First, a new synset-oriented word aligner is presented, which relies on a huge multilingual semantic network named BabelNet. Second, three unsupervised STS approaches are proposed: string kernel-based (SK), alignment-based (AL), and weighted alignment-based (WAL). Third, some limitations of the state-of-the-art approaches are tackled, and different similarity methods are demonstrated to be complementary with each other by proposing an unsupervised ensemble STS (UESTS) approach. The UESTS incorporates the merits of four similarity measures: proposed alignment-based, surface-based, corpus-based, and enhanced edit distance. The experimental results proved that the participation of the proposed aligner in STS is effective. Over all the evaluation data sets, the proposed UESTS outperforms the state-of-the-art unsupervised approaches, which is a promising result. INDEX TERMS Semantic textual similarity, word alignment, string kernel, BabelNet, SemEval, text processing, unsupervised learning, natural language processing.
This paper presents a new technique for clone detection using sequential pattern mining titled EgyCD. Over the last decade many techniques and tools for software clone detection have been proposed such as textual approaches, lexical approaches, syntactic approaches, semantic approaches …, etc. In this paper, we explore the potential of data mining techniques in clone detection. In particular, we developed a clone detection technique based on sequential pattern mining (SPM). The source code is treated as a sequence of transactions processed by the SPM algorithm to find frequent itemsets. We run three experiments to discover code clones of Type I, Type II and Type III and for plagiarism detection. We compared the results with other established code clone detectors. Our technique discovers all code clones in the source code and hence it is slower than the compared code clone detectors since they discover few code clones compared with EgyCD.
The composition of human milk during the first year of lactation was studied. The effect of age, parity and the length of inter-delivery interval on the composition is described.
This paper describes FCICU team systems that participated in SemEval-2017 Semantic Textual Similarity task (Task1) for monolingual and cross-lingual sentence pairs. A sense-based language independent textual similarity approach is presented, in which a proposed alignment similarity method coupled with new usage of a semantic network (BabelNet) is used. Additionally, a previously proposed integration between sense-based and surface-based semantic textual similarity approach is applied together with our proposed approach. For all the tracks in Task1, Run1 is a string kernel with alignments metric and Run2 is a sense-based alignment similarity method. The first run is ranked 10th, and the second is ranked 12th in the primary track, with correlation 0.619 and 0.617 respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.