Biologists rely on keyword-based search engines to retrieve superficially relevant papers, from which they must filter out the irrelevant information manually. Question answering (QA) systems can offer more efficient and user-friendly ways of retrieving such information. Two contributions are provided in this paper. First, a factoid QA system is developed to employ a named entity recognition module to extract answer candidates and a linear model to rank them. The linear model uses various semantic features, such as named entity types and semantic roles. To tune the weights of features used by the model, a novel supervised learning algorithm, which only needs small amounts of training data, is provided. Second, a QA system may assign several answers with the same score, making evaluation unfair. To solve this problem, an efficient formula for a mean average reciprocal rank (MARR) is proposed to reduce the complexity of its computation. After employing all effective semantic features, our system achieves a top-1 MARR of 74.11% and top-5 MARR of 76.68%. In comparison of the baseline system, the top-1 and top-5 MARR increase by 9.5% and 7.1%. In addition, the experiment result on test set shows our ranking method, which achieves 55.58% top-1 MARR and 66.99% top-5 MARR, significantly surpasses traditional BM25 and simple voting in performance by averagely 35.23% and 36.64%, respectively.
Statistical language modeling (SLM) has been used in many different domains for decades and has also been applied to information retrieval (IR) recently. Documents retrieved using this approach are ranked according their probability of generating the given query. In this paper, we present a novel approach that employs the generalized Expectation Maximization (EM) algorithm to improve language models by representing their parameters as observation probabilities of Hidden Markov Models (HMM). In the experiments, we demonstrate that our method outperforms standard SLM-based and tf.idfbased methods on TREC 2005 HARD Track data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.