Lexicon relation extraction given distributional representation of words is an important topic in NLP. We observe that the state-of-theart projection-based methods cannot be generalized to handle unseen hypernyms. We propose to analyze it in the perspective of pollution, that is, the predicted hypernyms are limited to those appeared in training set. We propose a word relation autoencoder (WRAE) model to address the challenge and construct the corresponding indicator to measure the pollution. Experiments on several hypernymlike lexicon datasets show that our model outperforms the competitors significantly.
We focus on a recently deployed system built for summarizing academic articles by concept tagging. The system has shown great coverage and high accuracy of concept identification which could be contributed by the knowledge acquired from millions of publications. Provided with the interpretable concepts and knowledge encoded in a pre-trained neural model, we investigate whether the tagged concepts can be applied to a broader class of applications. We propose transforming the tagged concepts into sparse vectors as representations of academic documents. The effectiveness of the representations is analyzed theoretically by a proposed framework. We also empirically show that the representations can have advantages on academic topic discovery and paper recommendation. On these applications, we reveal that the knowledge encoded in the tagging system can be effectively utilized and can help infer additional features from data with limited information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.