Vanessa Murdock scite author profile

In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

show abstract

Design trade-offs for search engine caching

Baeza-Yates

Gionis

Junqueira

et al. 2008

ACM Trans. Web

View full text Add to dashboard Cite

In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

show abstract

Automatic tagging and geotagging in video collections and communities

Larson

Soleymani

Serdyukov

et al. 2011

View full text Add to dashboard Cite

Automatically generated tags and geotags hold great promise to improve access to video collections and online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features.

show abstract

Aspects of sentence retrieval

Murdock

2007

SIGIR Forum

View full text Add to dashboard Cite

Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. I dedicate this thesis to my great-grandmother Mildred Graham who completed the sixth grade, taught her husband to read and put her son through law school, and to my great-grandmother Olive Peterson who completed a Masters in Mathematics, and put her daughter through the Sorbonne. My generation owes a debt to those who came before because they prepared the way for us, so that everything was easy, and we only had to work hard to succeed. We hope to return the favor by teaching our children to be strong and courageous, thereby making us worthy of the grace bestowed upon us. ACKNOWLEDGMENTSNothing worthwhile was ever achieved in isolation. I can not claim to have written this thesis without the significant influence of others. I would like to thank my advisor W. Bruce Croft. It was invaluable to have the benefit of his more than 30 years experience in the field of Information Retrieval standing behind his advising of me. I am most grateful that he allowed me to work with him for five years, and to be a part of the best information retrieval research group in the world. I would like to thank James Allan for serving on my committee. James sets a tone in the lab of friendly cooperation, which makes it possible to get more work done, and makes the work itself more enjoyable. I would like to thank David Jensen for the counsel he has given me in the time I have been at the University of Massachusetts, which was especially kind of him, considering that I am not a member of his lab. I would like to thank John Staudenmayer for serving as the outside member on my committee, to ensure the process is fair, even though he was already very busy. and who I admire for his courage, perseverance, and intelligence.I would like to thank my mother, Chris Jorgensen, who taught us to be strong and independent, and treated us as if we already were the intelligent, competent people she expected us to be. marization, novelty detection, and information provenance make use of a sentenceretrieval module as a preprocessing step. The performance of these systems is dependent on the quality of the sentence-retrieval module. Other tasks such as information extraction and machine translation operate on sentences, either using them as trainin...

show abstract

Improved query difficulty prediction for the web

Hauff

Murdock

Baeza-Yates

2008

View full text Add to dashboard Cite

Mining the web for points of interest

Rae

Murdock

Popescu

et al. 2012

View full text Add to dashboard Cite

A translation model for sentence retrieval

Murdock

Croft

2005

View full text Add to dashboard Cite

In this work we propose a translation model for monolingual sentence retrieval. We propose four methods for constructing a parallel corpus. Of the four methods proposed, a lexicon learned from a bilingual ArabicEnglish corpus aligned at the sentence level performs best, significantly improving results over the query likelihood baseline. Further, we demonstrate that smoothing from the local context of the sentence improves retrieval over the query likelihood baseline.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vanessa Murdock

Placing flickr photos on a map

The impact of caching on search engines

Design trade-offs for search engine caching

Automatic tagging and geotagging in video collections and communities

Aspects of sentence retrieval

Improved query difficulty prediction for the web

Mining the web for points of interest

A translation model for sentence retrieval

Contact Info

Product

Resources

About