For the complex questions of Chinese question answering system such as 'why', 'how' these non-factoid questions, we proposed an answer extraction method using discourse structures features and ranking algorithm. This method takes the j udge problem of answers relevance as learning to rank answers. First, the method analyses questions to generate the query string, and then uses rhetorical structure theory and the natural language processing technology of vocabulary, syntax, semantic analysis to analyze the retrieved documents, so as to determine the inherent relationship between paragraphs or sentences and generate the answer candidate paragraphs or sentences. Thirdly, construct the answer ranking model, extract five group features of similarity features, density and frequency features, translation features, discourse structure features and external knowledge features to train ranking model. Finally, re-ranking the answers with the training model and find the optimal answers. Experiments show that the proposed method can effectively improve the accuracy and quality of non factoid answers.
No abstract
Entity extraction involves multi-factors, and the different factor has an impact on the answer in varying degrees, this paper presents a machine learning approach to parameter learning for entity answer. Firstly, in view of characteristics of the Question Answering System (QA), we define three elements of the text score, passage score and entity score which influenced the answer extraction, also give the relevant computational method about them. Then collect 400 entity answers of product, person, and organization according to TREC2009 entity task requirements. With the help of search engines, retrieve related pages and calculate the score of the various factors related to the answer respectively. Thereafter compute the score of entity answers according to a linear combination of the various factors. Define an initial score to extract the entity answer and get a sorted list of answers. Finally, mark these entities answer to obtain the correct marked answers corpus, then build parameter learning model by the EM algorithm iterate gradually to find the optimal answer weight of different factors that influenced the answer extraction. We carried on the experiment in the TREC2009 entity task; it shows very good results for this method. The accuracy of entity answer has achieved 88.93%.
FAQ(Frequently Asked Questions) is the basis ofQuestion Answering System (QA) that oriented frequently asked questions database. For the FAQ is difficult to collect and organize, this paper proposed an automatic acquisition method of domain FAQ based on improved Bayes. Parsing HTML pages into DOM tree, combining with the restricted domain knowledge base, extracting the node information and structural characteristics of DOM tree as the classified feature, using the improved Bayesian classified learning algorithm, constructing the classification model, acquiring FAQ from the HTML page automatically and filtering out the domain FAQ , the experimental results of this method show that it has a remarkable effect.
Abstract. For least squares support vector machine (LSSVM) the lack of sparse, while the standard sparse algorithm exist a problem that it need to mark all of training data. We propose an active learning algorithm based on LSSVM to solve sparse problem. This method first construct a minimum classification LSSVM, and then calculate the uncertainty of the sample, select the closest category to mark the sample surface, and finally joined the training set of labeled samples and the establishment of a new classifier, repeat the process until the model accuracy to meet Requirements. 6 provided in the UCI data sets on the experimental results show that the proposed method can effectively improve the sparsity of LSSVM, and can reduce the cost labeled samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.