William I. Grosky scite author profile

Natural Language Understanding has seen an increasing number of publications in the last years, especially after robust word embedding models became popular. These models gained a special place in the spotlight when they proved themselves able to capture and represent semantic relations underneath huge amounts of data. Nevertheless, traditional models often fall short in intrinsic issues of linguistics, such as polysemy and homonymy. Multi-sense word embeddings were devised to alleviate these and other problems by representing each word-sense separately, but studies in this area are still in its infancy and much can be explored. We follow this scenario by proposing an unsupervised technique that disambiguates and annotates words by their specific sense, considering their context influence. These are later used to train a word embeddings model to produce a more accurate vector representation. We test our approach in 6 different benchmarks for the word similarity task, showing that our approach can sustain good results and often outperforms current state-of-the-art systems.

show abstract

Detecting Machine-Obfuscated Plagiarism

Foltýnek

Ruas

Scharpf

et al. 2020

View full text Add to dashboard Cite

Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machineparaphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.

show abstract

Efficient Continuous Skyline Computation

Morse

Patel

Grosky

2006

View full text Add to dashboard Cite

SenseWeb: An Infrastructure for Shared Sensing

et al. 2007

View full text Add to dashboard Cite

Bridging the Semantic Gap in Image Retrieval

Zhao¹,

Grosky²

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

William I. Grosky

Multi-sense embeddings through a word sense disambiguation process

Detecting Machine-Obfuscated Plagiarism

Efficient Continuous Skyline Computation

SenseWeb: An Infrastructure for Shared Sensing

Bridging the Semantic Gap in Image Retrieval

Contact Info

Product

Resources

About