In sponsored search, advertisers bid on phrases representative of offered products or services. For large advertisers, these phrases often come from quasi-algorithmically generated lists of thousands of terms prone to poor linguistic construction. A bidded term by itself is usually unsuitable for direct insertion into an ad copy template; it must be rephrased and capitalized properly to fit the template, possibly with additional language to avoid semantic ambiguity. We develop a natural language generation system to automate these steps, preparing a list of terms for insertion into an ad template. For each input term, our system first finds a proper word ordering by mining a corpus of Web search query logs. Next it determines whether the term is ambiguous and-if semantics dictate-attaches a clarifying modifier culled from query logs. Finally, it applies proper capitalization by analyzing pages from Web search engine results. Each step yields a plausible set of displayable forms from which a machine-learned model selects the best. The models are trained and tested on a large set of human-labeled data. The overall system significantly outperforms baseline systems that use simple heuristics.
Web searchers reformulate their queries, as they adapt to search engine behavior, learn more about a topic, or simply correct typing errors. Automatic query rewriting can help user web search, by augmenting a user's query, or replacing the query with one likely to retrieve better results. One example of query-rewriting is spell-correction. We may also be interested in changing words to synonyms or other related terms. For Japanese, the opportunities for improving results are greater than for languages with a single character set, since documents may be written in multiple character sets, and a user may express the same meaning using different character sets. We give a description of the characteristics of Japanese search query logs and manual query reformulations carried out by Japanese web searchers. We use characteristics of Japanese query reformulations to extend previous work on automatic query rewriting in English, taking into account the Japanese writing system. We introduce several new features for building models resulting from this difference and discuss their impact on automatic query rewriting. We also examine This work was done while Kevin Bartz and Pero Subasic were employees at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.