Huanyun Zong scite author profile

Guo

et al. 2011

For the complex questions of Chinese question answering system such as 'why', 'how' these non-factoid questions, we proposed an answer extraction method using discourse structures features and ranking algorithm. This method takes the j udge problem of answers relevance as learning to rank answers. First, the method analyses questions to generate the query string, and then uses rhetorical structure theory and the natural language processing technology of vocabulary, syntax, semantic analysis to analyze the retrieved documents, so as to determine the inherent relationship between paragraphs or sentences and generate the answer candidate paragraphs or sentences. Thirdly, construct the answer ranking model, extract five group features of similarity features, density and frequency features, translation features, discourse structure features and external knowledge features to train ranking model. Finally, re-ranking the answers with the training model and find the optimal answers. Experiments show that the proposed method can effectively improve the accuracy and quality of non factoid answers.

Grey Relational Analysis for Query Expansion

Zou

et al. 2013

Parameter learning for multi-factors of entity answer extracting

Mao

et al. 2010

Entity extraction involves multi-factors, and the different factor has an impact on the answer in varying degrees, this paper presents a machine learning approach to parameter learning for entity answer. Firstly, in view of characteristics of the Question Answering System (QA), we define three elements of the text score, passage score and entity score which influenced the answer extraction, also give the relevant computational method about them. Then collect 400 entity answers of product, person, and organization according to TREC2009 entity task requirements. With the help of search engines, retrieve related pages and calculate the score of the various factors related to the answer respectively. Thereafter compute the score of entity answers according to a linear combination of the various factors. Define an initial score to extract the entity answer and get a sorted list of answers. Finally, mark these entities answer to obtain the correct marked answers corpus, then build parameter learning model by the EM algorithm iterate gradually to find the optimal answer weight of different factors that influenced the answer extraction. We carried on the experiment in the TREC2009 entity task; it shows very good results for this method. The accuracy of entity answer has achieved 88.93%.

FAQ Extracting and Domain Filtering Based on Improved Bayes

et al. 2009

FAQ(Frequently Asked Questions) is the basis ofQuestion Answering System (QA) that oriented frequently asked questions database. For the FAQ is difficult to collect and organize, this paper proposed an automatic acquisition method of domain FAQ based on improved Bayes. Parsing HTML pages into DOM tree, combining with the restricted domain knowledge base, extracting the node information and structural characteristics of DOM tree as the classified feature, using the improved Bayesian classified learning algorithm, constructing the classification model, acquiring FAQ from the HTML page automatically and filtering out the domain FAQ , the experimental results of this method show that it has a remarkable effect.

Active Learning for Sparse Least Squares Support Vector Machines

Zou

et al. 2011

Abstract. For least squares support vector machine (LSSVM) the lack of sparse, while the standard sparse algorithm exist a problem that it need to mark all of training data. We propose an active learning algorithm based on LSSVM to solve sparse problem. This method first construct a minimum classification LSSVM, and then calculate the uncertainty of the sample, select the closest category to mark the sample surface, and finally joined the training set of labeled samples and the establishment of a new classifier, repeat the process until the model accuracy to meet Requirements. 6 provided in the UCI data sets on the experimental results show that the proposed method can effectively improve the sparsity of LSSVM, and can reduce the cost labeled samples.