2011
DOI: 10.1007/978-3-642-23538-2_31
|View full text |Cite
|
Sign up to set email alerts
|

Question Classification by Weighted Combination of Lexical, Syntactic and Semantic Features

Abstract: Abstract. We developed a learning-based question classifier for question answering systems. A question classifier tries to predict the entity type of the possible answers to a given question written in natural language. We extracted several lexical, syntactic and semantic features and examined their usefulness for question classification. Furthermore we developed a weighting approach to combine features based on their importance. Our result on the well-known TREC questions dataset is competitive with the state… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
25
0

Year Published

2014
2014
2017
2017

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 21 publications
(29 citation statements)
references
References 15 publications
0
25
0
Order By: Relevance
“…A benchmark dataset built by Li and Roth [3] is widely used in literature, and NLP techniques able to analyze the question are more or less well established [4]. On the other hand, different authors chose to extract from questions a broad variety of features [5], usually divided into lexical, syntactic and semantic [6,7]. Different works focused on the extraction of particular words, like whword and head-word [8], but good results were gained as long as the number of employed features was increased to a very high number [6][7][8][9][10].…”
Section: Introductionmentioning
confidence: 99%
“…A benchmark dataset built by Li and Roth [3] is widely used in literature, and NLP techniques able to analyze the question are more or less well established [4]. On the other hand, different authors chose to extract from questions a broad variety of features [5], usually divided into lexical, syntactic and semantic [6,7]. Different works focused on the extraction of particular words, like whword and head-word [8], but good results were gained as long as the number of employed features was increased to a very high number [6][7][8][9][10].…”
Section: Introductionmentioning
confidence: 99%
“…This feature aims at individuating the most informative word of the question for classification purposes. Introduced in [8], it is widely used and recognized to be useful [3,4,6,8,20]. It is extracted here from the parse tree, by using redefined rules, similar to those proposed by Collins [21], already modified in other works [6,8].…”
Section: Features Extraction and Representationmentioning
confidence: 98%
“…This is the mostly used set of features, since it usually allows obtaining the best results [3,4,6,8]. Unigrams are obtained from the set of tagged tokens of the question, by eliminating tokens with some tags, like DT, IN, and punctuation.…”
Section: Features Extraction and Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…A request is classified into the six actionable types mentioned earlier, for which we developed a mul- (Loni et al, 2011) due to the general question-based construct of the requests. Apache OpenNLP (Apache Software Foundation, 2011) was used to generate unigrams, bigrams, trigrams, chunking, and tagged unigrams, while Stanford Parser's implemented Collins rules (Collins, 2003) were used to obtain the headword.…”
Section: Request Type Classificationmentioning
confidence: 99%