This paper presents a detailed discussion of problem formulation and data representation issues in the design, deployment, and operation of a massive-scale machine learning system for targeted display advertising. Notably, the machine learning system itself is deployed and has been in continual use for years, for thousands of advertising campaigns (in contrast to simply having the models from the system be deployed). In this application, acquiring sufficient data for training from the ideal sampling distribution is prohibitively expensive. Instead, data are drawn from surrogate domains and learning tasks, and then transferred to the target task. We present the design of this multistage transfer learning system, highlighting the problem formulation aspects. We then present a detailed experimental evaluation, showing that the different transfer stages indeed each add value. We next present production results across a variety of advertising clients from a variety of industries, illustrating the performance of the system in use. We close the paper with a collection of lessons learned from the work over half a decade on this complex, deployed, and broadly used machine learning system.
ProtoMol is a high-performance framework in C++ for rapid prototyping of novel algorithms for molecular dynamics and related applications. Its flexibility is achieved primarily through the use of inheritance and design patterns (object-oriented programming). Performance is obtained by using templates that enable generation of efficient code for sections critical to performance (generic programming). The framework encapsulates important optimizations that can be used by developers, such as parallelism in the force computation. Its design is based on domain analysis of numerical integrators for molecular dynamics (MD) and of fast solvers for the force computation, particularly due to electrostatic interactions. Several new and efficient algorithms are implemented in ProtoMol. Finally, it is shown that ProtoMol's sequential performance is excellent when compared to a leading MD program, and that it scales well for moderate number of processors. Binaries and source codes for Windows, Linux, Solaris, IRIX, HP-UX, and AIX platforms are available under open source license at http://protomol.sourceforge.net.
The field of market basket analysis, the search for meaningful associations in customer purchase data, is one of the oldest areas of data mining. The typical solution involves the mining and analysis of association rules, which take the form of statements such as ''people who buy diapers are likely to buy beer''. It is well-known, however, that typical transaction datasets can support hundreds or thousands of obvious association rules for each interesting rule, and filtering through the rules is a non-trivial task (Klemettinen et al. In: Proceedings of CIKM, pp 401-407, 1994). One may use an interestingness measure to quantify the usefulness of various rules, but there is no single agreed-upon measure and different measures can result in very different rankings of association rules. In this work, we take a different approach to mining transaction data. By modeling the data as a product network, we discover expressive communities (clusters) in the data, which can then be targeted for further analysis. We demonstrate that our network based approach can concisely isolate influence among products, mitigating the need to search through massive lists of association rules. We develop an interestingness measure for communities of products and show that it isolates useful, actionable communities. Finally, we build upon our experience with product networks to propose a comprehensive analysis strategy by combining both traditional and network-based techniques. This framework is capable of generating insights that are difficult to achieve with traditional analysis methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.