Distributional representations for handling sparsity in supervised sequence-labeling

Huang, Fei; Yates, Alexander

doi:10.3115/1687878.1687948

Cited by 47 publications

(54 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, the thresholding of these combinatorial features by simple counts effectively suppresses the combinatorial increase of the parameters. At the same time, although global information had also been used in several reports (Nakagawa and Matsumoto, 2006;Huang and Yates, 2009;Turian et al, 2010;Schnabel and Schütze, 2014), the nonlinear interactions of these features were not well investigated since these features are often dense continuous features and the explicit non-linear expansions are counterintuitive and drastically increase the number of the model parameters. In our work, we investigate neural networks used to represent the non-linearity of global information for POS tagging in a compact way.…”

Section: Introductionmentioning

confidence: 99%

“…All of them are continuous dense features and we use a feed-forward neural network to exploit the non-linearity of these features. Although all of them except (3) have been used for POS tagging in previous work (Nakamura et al, 1990;Schmid, 1994;Schnabel and Schütze, 2014;Huang and Yates, 2009), we propose a neural network approach to capture the non-linear interactions of these features. By feeding these features into neural networks as an input vector, we can expect our tagger can handle not only the nonlinearity of the N-grams of the same kinds of features but also the non-linear interactions among the different kind of features.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Neural Networks Leverage Corpus-wide Information for Part-of-speech Tagging

Tsuboi¹

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We propose a neural network approach to benefit from the non-linearity of corpuswide statistics for part-of-speech (POS) tagging. We investigated several types of corpus-wide information for the words, such as word embeddings and POS tag distributions. Since these statistics are encoded as dense continuous features, it is not trivial to combine these features comparing with sparse discrete features. Our tagger is designed as a combination of a linear model for discrete features and a feed-forward neural network that captures the non-linear interactions among the continuous features. By using several recent advances in the activation functions for neural networks, the proposed method marks new state-of-the-art accuracies for English POS tagging tasks.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Neural Networks Leverage Corpus-wide Information for Part-of-speech Tagging

Tsuboi¹

2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Some of them focused on how to use a small amount of labeled data from a target domain in conjunction with a large amount of labeled data from a source domain [8]- [12]. Other works on domain adaption (DA) focused on adapting their models from the perspective of learning, based on the labeled data sets of the source and target domains [13], [14].…”

Section: Related Researchmentioning

confidence: 99%

Domain-Adaptation Technique for Semantic Role Labeling with Structural Learning

Lim

Lee

Ryu

et al. 2014

ETRI J

View full text Add to dashboard Cite

Semantic role labeling (SRL) is a task in natural-language processing with the aim of detecting predicates in the text, choosing their correct senses, identifying their associated arguments, and predicting the semantic roles of the arguments. Developing a high-performance SRL system for a domain requires manually annotated training data of large size in the same domain. However, such SRL training data of sufficient size is available only for a few domains. Constructing SRL training data for a new domain is very expensive. Therefore, domain adaptation in SRL can be regarded as an important problem. In this paper, we show that domain adaptation for SRL systems can achieve state-of-the-art performance when based on structural learning and exploiting a prior model approach. We provide experimental results with three different target domains showing that our method is effective even if training data of small size is available for the target domains. According to experimentations, our proposed method outperforms those of other research works by about 2% to 5% in F-score.Keywords: Domain adaptation, semantic role labeling, natural language, semantic analysis, structured learning, prior model. I. IntroductionBig data explosion has led to an exponential growth in the amount of valuable textual data in many fields. Thus, automatic information retrieval (IR) and information extraction (IE) methods have become more important in helping researchers and analysts to keep track of the latest developments in their fields. Current IR is still mostly limited to keyword search and unable to infer relationships between entities in a text. A system that is able to understand how words in a sentence are related semantically can greatly improve the quality of IE and would allow IR to handle more complex user queries.Semantic role labeling (SRL) is a task for semantic processing of natural-language text, wherein the semantic role labels of the arguments associated with the predicates in a sentence are predicted. Recently, SRL has become increasingly popular as natural-language processing technology advances. The purpose of SRL is to find "who does what to whom, when, and where" in natural-language text by recognizing the semantic roles of the arguments of the predicates.As a result of performing SRL on a given sentence and its predicate, each word in the sentence is assigned a semantic role label. By combining the labels for the words, the output of SRL can be viewed as a sequence of semantic role labels. The sequence is generated for each predicate. For example, as in Fig. 1, the semantic role A0 represents the "agent" of "wants" and the semantic role A1 denotes the thing "being wanted." The information produced as a result of an SRL task is valuable for IE and other natural-language understanding tasks such as question answering [1] and online advertising services [2].In previous research, most works on SRL focused on

show abstract

“…At tagging time, a sentence is tagged by the model that is most similar to that sentence. Huang and Yates (2009) train a Conditional Random Field (CRF) tagger with features retrieved from a smoothing model trained using both source and target domain unlabeled data. Adding latent states to the smoothing model further improves the POS tagging accuracy (Huang and Yates, 2012).…”

Section: Related Workmentioning

confidence: 99%

“…In such work, a word is represented by the distribution of other words that co-occur with it. Distributional representations of words have been successfully used in many language processing tasks such as entity set expansion (Pantel et al, 2009), part-of-speech (POS) tagging and chunking (Huang and Yates, 2009), ontology learning (Curran, 2005), computing semantic textual similarity (Besançon et al, 1999), and lexical inference (Kotlerman et al, 2012).…”

Section: Introductionmentioning

confidence: 99%

Learning to Predict Distributions of Words Across Domains

Bollegala

Weir

Carroll

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Although the distributional hypothesis has been applied successfully in many natural language processing tasks, systems using distributional information have been limited to a single domain because the distribution of a word can vary between domains as the word's predominant meaning changes. However, if it were possible to predict how the distribution of a word changes from one domain to another, the predictions could be used to adapt a system trained in one domain to work in another. We propose an unsupervised method to predict the distribution of a word in one domain, given its distribution in another domain. We evaluate our method on two tasks: cross-domain partof-speech tagging and cross-domain sentiment classification. In both tasks, our method significantly outperforms competitive baselines and returns results that are statistically comparable to current stateof-the-art methods, while requiring no task-specific customisations.

show abstract

Distributional representations for handling sparsity in supervised sequence-labeling

Cited by 47 publications

References 22 publications

Neural Networks Leverage Corpus-wide Information for Part-of-speech Tagging

Neural Networks Leverage Corpus-wide Information for Part-of-speech Tagging

Domain-Adaptation Technique for Semantic Role Labeling with Structural Learning

Learning to Predict Distributions of Words Across Domains

Contact Info

Product

Resources

About