In sponsored search, advertisers bid on phrases representative of offered products or services. For large advertisers, these phrases often come from quasi-algorithmically generated lists of thousands of terms prone to poor linguistic construction. A bidded term by itself is usually unsuitable for direct insertion into an ad copy template; it must be rephrased and capitalized properly to fit the template, possibly with additional language to avoid semantic ambiguity. We develop a natural language generation system to automate these steps, preparing a list of terms for insertion into an ad template. For each input term, our system first finds a proper word ordering by mining a corpus of Web search query logs. Next it determines whether the term is ambiguous and-if semantics dictate-attaches a clarifying modifier culled from query logs. Finally, it applies proper capitalization by analyzing pages from Web search engine results. Each step yields a plausible set of displayable forms from which a machine-learned model selects the best. The models are trained and tested on a large set of human-labeled data. The overall system significantly outperforms baseline systems that use simple heuristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.