Neural network models are oftentimes restricted by limited labeled instances and resort to advanced architectures and features for cutting edge performance. We propose to build a recurrent neural network with multiple semantically heterogeneous embeddings within a self-training framework. Our framework makes use of labeled, unlabeled, and social media data, operates on basic features, and is scalable and generalizable. With this method, we establish the state-of-the-art result for both in-and cross-domain for a clinical temporal relation extraction task.
We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs. Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks inputs with respect to the cosine similarity between their hidden representations shows comparable performance with the state-of-the-art supervised models and in some cases outperforms them.
Current opinion lexicons contain most of the common opinion words, but they miss slang and so-called urban opinion words and phrases (e.g. delish, cozy, yummy, nerdy, and yuck). These subjectivity clues are frequently used in community questions and are useful for opinion question analysis. This paper introduces a principled approach to constructing an opinion lexicon for community-based question answering (cQA) services. We formulate the opinion lexicon induction as a semi-supervised learning task in the graph context. Our method makes use of existing opinion words to extract new opinion entities (slang and urban words/phrases) from community questions. It then models the opinion entities in a graph context to learn the polarity of the new opinion entities based on the graph connectivity information. In contrast to previous approaches, our method not only learns such polarities from the labeled data but also from the unlabeled data and is more feasible in the web context where the dictionarybased relations (such as synonym, antonym, or hyponym) between most words are not available for constructing a high quality graph. The experiments show that our approach is effective both in terms of the quality of the discovered new opinion entities as well as its ability in inferring their polarity. Furthermore, since the value of opinion lexicons lies in their usefulness in applications, we show the utility of the constructed lexicon in the sentiment classification task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.