Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance-and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance-and document-level syntactic similarity which pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels. FastKAS-SIM is more robust to syntactic dissimilarities and differences in length, and runs up to to 5.2 times faster than our baseline method over the documents in the r/ChangeMyView corpus. * denotes equal contribution.Utterance 1: When we hate, we always move away from the grace of God. When we become resentful and unforgiving, the world around us seems spiteful and meaningless. Utterance 2: How can you be skiing if you are already swimming? FastKASSIM Score: 0.219 CASSIM Score: 0.838 LSM Score: 0.623 Utterance 1: I like swimming because it is cool. Utterance 2: I love running because it is fun.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.