This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an overall F 1 score of 89.15% for English-French similarity detection at chunk level (88.5% at sentence level) on a very challenging corpus.
This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw robust conclusions on the best methods while deeply analyzing correlations across document styles and languages.
We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of SpanishEnglish sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked 1 st on track 4a with a correlation of 83.02% with human annotations.
Since 2005, Compilatio has been offering tools to help detect and prevent plagiarism. Users of similarity detection software were initially attracted by the ability to track down cheaters. They are now more aware of the tools and services offered to create an environment that encourages the adoption of integrity and citizenship values, especially digital ones. They are aware that plagiarism is not a passing evil to be eradicated, but a deep-seated temptation that each individual must learn to overcome. The technology used to help teachers spot cheating has also evolved. The approach was initially syntactic, comparing texts formally to detect similarities. It then became semantic, using socalled artificial intelligence techniques to find similarities between different words with the same meaning. The issues related to plagiarism prevention illustrate how technology and pedagogy can be used together to train individuals for their future professional and civic life.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.