The proximity of query terms in a document is a very important information to enable ranking models go beyond the "bag of word" assumption in information retrieval. This paper studies the integration of term proximity information into the unigram language modeling. A new proximity language model (PLM) is proposed which views query terms' proximity centrality as the Dirichlet hyper-parameter that weights the parameters of the unigram document language model. Several forms of proximity measure are developed to be used in PLM which could compute a query term's proximate centrality in a specific document. In experiments, the proximity language model is compared with the basic language model and previous works that combine the proximity information with language model using linear score combination. The experiment results show that the proposed model performs better in both top precision and average precision.
This paper develops and evaluates an approach for combining semantic information with proximity information for text summarization. The approach is based on the proximity language model, which incorporates proximity information into the unigram language model. This paper novelly expands the proximity language model to also incorporate semantic information using latent semantic analysis (LSA). We argue that this approach achieves a good balance between syntactic and semantic information. We evaluate the approach using ROUGE scores on the Text Analysis Conference (TAC) 2009 Summarization task, and find that incorporating LSA into PLM gives improvements over the baseline models. 3 PROXIMITY LANGUAGE MODEL Proximity language model (PLM) forms the heart of our ranking function, and is based on the unigram language model (Zhao and Yun, 2009).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.