Extracting Compact Sets of Features for Question Classification in Cognitive Systems: A Comparative Study

Pota, Marco; Fuggi, Angela; Esposito, Massimo; Pietro, Giuseppe De

doi:10.1109/3pgcic.2015.118

Cited by 12 publications

(12 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In particular, while Collins' rules prefer verbs, here nouns are favoured, and rules are defined in different manners, depending on whether the Principal-Wh-Word exists. For more information, see [19].…”

Section: Features Extraction and Representationmentioning

confidence: 98%

“…In particular, while Collins' rules prefer auxiliary verbs, here non-auxiliary ones are chosen if present. For more information, see [19].…”

Section: Features Extraction and Representationmentioning

confidence: 98%

“…The principal wh-word is obtained here by considering the parse tree and applying a set of rules specifically tailored for searching, if any, the wh-word of the principal phrase of the question. For more information, see [19]. Moreover, here, wh-words with similar meanings are grouped, so that different wh-words of each group correspond to the mostly used one.…”

Section: Features Extraction and Representationmentioning

confidence: 99%

See 2 more Smart Citations

A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems

Pota

Esposito

Pietro

2016

Smart Innovation, Systems and Technologies

Self Cite

View full text Add to dashboard Cite

Cognitive Systems have attracted attention in last years, especially regarding high interactivity of Question Answering systems. In this context, Question Classification plays an important role for individuation of answer type. It involves the use of Natural Language Processing of the question, the extraction of a broad variety of features, and the use of machine learning algorithms to map features with a given taxonomy of question classes. In this work, a novel learning approach is proposed, based on the use of Support Vector Machines, for building a set of classifiers, each one to use for different questions and comprising the respective features, chosen through a particular forward-selection procedure. This approach aims at decreasing the total number of features, by avoiding those giving scarce information and/or noise. A Question Classification framework is implemented, comprising new sets of features with low numerosity. The application on a benchmark dataset shows classification accuracy competitive with the state-of-the-art, by considering a lower number of features.

show abstract

Section: Features Extraction and Representationmentioning

confidence: 98%

“…In particular, while Collins' rules prefer auxiliary verbs, here non-auxiliary ones are chosen if present. For more information, see [19].…”

Section: Features Extraction and Representationmentioning

confidence: 98%

Section: Features Extraction and Representationmentioning

confidence: 99%

See 1 more Smart Citation

A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems

Pota

Esposito

Pietro

2016

Smart Innovation, Systems and Technologies

Self Cite

View full text Add to dashboard Cite

show abstract

“…Isso inclui remover todas as palavras irrelevantes como pronomes, preposições, pontuação, além da unificação de palavras semelhantes. Entre as principais operações possíveis de pré-processamento, destacam-se (POTA et al, 2015;ALAHMADI, 2016 4. Stemming: substituição de cada token pela sua palavra de origem, por exemplo, "escritor", "escrita" e "escreveram" por "escrever".…”

Section: Pré-processamentounclassified

“…Como resultado da fase de pré-processamento, diversos tipos de termos podem ser extraídos e, com isso, influenciar diretamente na performance do processo de classificação, de acordo com a estratégia adotada. Normalmente, são divididos em léxicos, sintáticos e semânticos (POTA et al, 2015;JAYALAKSHMI;SHESHASAAYEE, 2015…”

Section: Extração De Termosunclassified

Classificação automática de questões baseada em competências: ENEM - Estudo de caso

Silva¹

View full text Add to dashboard Cite

Introduction: The large amount of digital textual information available on the Internet makes the organization, analysis and extraction of knowledge essential both in the academic world and in the job market, making automatic text classification increasingly important. Question classification is a subgroup of text classification and basically consists of associating one or more labels with each question, according to a predetermined criterion, but with less text available than the general documents. The main applications of automatic question classification systems are: QA (Question/Answering), IR (Information Retrieval), educational environment, and specific languages processing. The QA and IR systems have as their starting point a question written in natural language and, from there, search a collection of documents in the web that are compatible with the subject described. Considering specifically the educational environment, the automatic generation of assessment tests has immediate practical application in e-learning systems by enabling the personalization of teaching through the search for questions that are appropriate to a particular learning profile, the so-called adpative learning systems. To enable personalization, it is essential to classify questions within a representative range of appropriate competencies and skills. Large-scale evaluations (ENEM, SAEB, Prova Brasil) could be a source of information for this generation, as they use evaluation reference matrices to classify questions according to the areas of knowledge, disciplines, competencies and expected skills of students. One way to perform this classification is through Machine Learning algorithms that are able to extract patterns or generalize classes by generating mathematical models from the available data. Examples of Machine Learning algorithms are: neural networks, decision trees, support vector machines (SVM), naive bayes, among others. The different forms of text representation and Machine Learning algorithms have extensive research done when it comes to classifying documents with large amounts of text; when it comes to short excerpts (such as questions), this task becomes more complex because the amount of text available for analysis is reduced when compared to other types of textual documents. In addition, the majority of current research addresses the problem of QA or IR, and there is not a lot of research available considering the educational environment. Objectives: (i) Identify the architecture of a classifier or set of classifiers in order to maximize the performance of the question classification process in the educational context; (ii) perform an empirical evaluation to compare the performance of the different combinations used; (iii) make available representations, algorithms, source codes and tools developed for the scientific community to evaluate and replicate results; and (iv) make available tools for integration and application of content developed for use by other platforms and institutions (schools, companies) interes...

show abstract

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning

Chotirat¹,

Meesad²

2021

Heliyon

View full text Add to dashboard Cite

Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro 𝐹 1 -score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro 𝐹 1 -score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset.

show abstract

Extracting Compact Sets of Features for Question Classification in Cognitive Systems: A Comparative Study

Cited by 12 publications

References 17 publications

A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems

A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems

Classificação automática de questões baseada em competências: ENEM - Estudo de caso

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning

Contact Info

Product

Resources

About