R eal-time news (also known as live news, streaming news or breaking news) -especially news focused on business and financial mattersis widely read and builds up quickly. A typical newspaper/newswire aggregation system will process three stories per second around the clock, adding approximately 250,000 stories every business day to the collection. Large commercial systems substantially surpass this total.The needs and behavior of end-users searching collections of this type differ from general searching norms in several key aspects:I The computational burden is closer to the classic alert/routing problem than it is to the ad-hoc search problem.I Business work is about the opportunities of today and tomorrow. I All news is structured. I User queries do not typically consist of a few terms. I Users have a high degree of topic familiarity and topic focus.
The Computational BurdenThe computational burden is closer to the classic alert/routing problem than it is to the ad-hoc search problem (these are standard terms-of-art in IR -as, for example, Robertson [1] explains). In fact it closely matches the batch routing problem, because (1) users save searches and re-execute them frequently (rather than entering searches in an ad-hoc manner) and (2) very little latency is tolerated by users. A news article arriving in the last few moments should be returned as a hit if the article is a search match to a justexecuted query.
C O N T E N T S N E X T P A G E > N E X T A R T I C L E > < P R E V I O U S P A G E
Search 2009Bulletin of the American Society for Information Science and Technology -October/November 2009 -Volume 36, Number 1
24Combine this low-latency requirement with a saved-search, re-execution cycle that is often a minute or less (programmed as web page auto-refresh), and the similarity with alerting becomes obvious. Thus the old folktheorem, "In search you have all the time you need to study the archive, and no time to study the query; in alerting you have all the time you need to study the query, and no time to study the archive," doesn't hold here.However, real-time news is still search. A static set of matches (in this case, news headlines and summaries) is returned, and the user demands some ranking of the results by meaningfulness and pertinence. In the alert problem it is sufficient to present the user with a temporal stream of matches; not so here.Thus real-time news search demands the computational agility of alerting and the semantic processing of search. The job is made somewhat easier by the batch nature of the query collection -the saved searches. In the commercial system NewsEdge™, of the last million searches (looking backward from August 15, 2009), approximately 96.5% of executed searches were repetitive submissions of stored queries. This statistic excludes users who specifically posted alerts and received a content push from NewsEdge -it refers only to users who requested (perhaps automatically) and received a web page of results.