To build an effective community question answering (cQA) service, determining ways to obtain questions similar to an input query question is a significant research issue. The major challenges for question retrieval in cQA are related to solving the lexical gap problem and estimating the relevance between questions. In this study, we first solve the lexical gap problem using a translation-based language model (TRLM). Thereafter, we determine features and methods that are competent for estimating the relevance between two questions. For this purpose, we explore ways to use the results of a dependency parser and question classification for category information. Head-dependent pairs are first extracted as bigram features, called dependency bigrams, from the analysis results of the dependency parser. The probability of each category is estimated using the softmax approach based on the scores of the classification results. Subsequently, we propose two retrieval models-the dependency-based model (DM) and category-based model (CM)-and they are applied to the previous model, TRLM. The experimental results demonstrate that the proposed methods significantly improve the performance of question retrieval in cQA services.
Classiying user's question into several topics helps respondents answering the question in a cQA service. The word weighting method must estimate the appropriate weight of a word to improve the category (or topic) classification. In this paper, we propose a novel effective word weighting method based on a language model for automatic category classification in the cQA service. We first calculate the occurrence probability of a word in each category by using a language model and then the final weight of each word is estimated by ratio of the occurrence probability of the word on a category to the occurrence probability of the word on the other categories. As a result, the proposed method significantly improves the performance of the category classification.
This paper introduces a new question expanding method for question classification in cQA services. Input questions are mostly generated by a small size of text in the cQA services, and test inputs consist of only a question whereas training data do a pair of question and answer. Thus, the input questions cannot provide enough information for good classification in many cases. To solve this problem, we propose the question expanding method by pseudo relevant feedback and automatic answer generation. For pseudo relevant feedback, we first find relevant question-answer pairs related to an input question using the Indri search engine, and then top relevant words are chosen as expanded words. The automatic answer generation tries to create pseudo answers by adding question-related words using translation probabilities from questions to answers by Giza++. As a result, we obtain the significant improved performances when two approaches are effectively combined.
This paper claims to use a new question expansion method for question classification in cQA services. The input questions consist of only a question whereas training data do a pair of question and answer. Thus they cannot provide enough information for good classification in many cases. Since the answer is strongly associated with the input questions, we try to create a pseudo answer to expand each input question. Translation probabilities between questions and answers and a pseudo relevant feedback technique are used to generate the pseudo answer. As a result, we obtain the significant improved performances when two approaches are effectively combined. key words: question classification, cQA service, pseudo relevant feedback (PRF), question expansion, translation probability
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.