Luís Marujo scite author profile

We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly, the arbitrary nature of the form-function relationship in language, our "composed" word representations yield state-of-the-art results in language modeling and part-of-speech tagging. Benefits over traditional baselines are particularly pronounced in morphologically rich languages (e.g., Turkish).

show abstract

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Wang¹,

Luís²,

Marujo³

et al. 2015

Preprint

View full text Add to dashboard Cite

Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network

Marujo

Jiang

et al. 2018

View full text Add to dashboard Cite

In this paper, we target at improving the performance of multi-label emotion classification with the help of sentiment classification. Specifically, we propose a new transfer learning architecture to divide the sentence representation into two different feature spaces, which are expected to respectively capture the general sentiment words and the other important emotion-specific words via a dual attention mechanism. Extensive experimental results demonstrate that our transfer learning approach can outperform several strong baselines and achieve the state-of-the-art performance on two benchmark datasets.

show abstract

Automatic Keyword Extraction on Twitter

Marujo

Wang

Trancoso

et al. 2015

View full text Add to dashboard Cite

In this paper, we build a corpus of tweets from Twitter annotated with keywords using crowdsourcing methods. We identify key differences between this domain and the work performed on other domains, such as news, which makes existing approaches for automatic keyword extraction not generalize well on Twitter datasets. These datasets include the small amount of content in each tweet, the frequent usage of lexical variants and the high variance of the cardinality of keywords present in each tweet. We propose methods for addressing these issues, which leads to solid improvements on this dataset for this task.

show abstract

Crowdsourcing High-Quality Parallel Data Extraction from Twitter

Marujo¹,

Dyer²,

Black³

et al. 2014

View full text Add to dashboard Cite

High-quality parallel data is crucial for a range of multilingual applications, from tuning and evaluating machine translation systems to cross-lingual annotation projection. Unfortunately, automatically obtained parallel data (which is available in relative abundance) tends to be quite noisy. To obtain high-quality parallel data, we introduce a crowdsourcing paradigm in which workers with only basic bilingual proficiency identify translations from an automatically extracted corpus of parallel microblog messages. For less than $350, we obtained over 5000 parallel segments in five language pairs. Evaluated against expert annotations, the quality of the crowdsourced corpus is significantly better than existing automatic methods: it obtains an performance comparable to expert annotations when used in MERT tuning of a microblog MT system; and training a parallel sentence classifier with it leads also to improved results. The crowdsourced corpora will be made available in

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Luís Marujo

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network

Automatic Keyword Extraction on Twitter

Crowdsourcing High-Quality Parallel Data Extraction from Twitter

Contact Info

Product

Resources

About