No abstract
Abstract. Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount of error reduction is linked to the "degree to which the descriptions for a class make errors in a correlated manner." We present a precise and novel definition for this notion and use twenty-nine data sets to show that the amount of observed error reduction is negatively correlated with the degree to which the descriptions make errors in a correlated manner. We empirically show that it is possible to learn descriptions that make less correlated errors in domains in which many ties in the search evaluation measure (e.g. information gain) are experienced during learning. The paper also presents results that help to understand when and why multiple descriptions are a help (irrelevant attributes) and when they are not as much help (large amounts of class noise).
The Netflix competition of 2006 [2] has spurred significant activity in the recommendations field, particularly in approaches using latent factor models [3,5,8,12]. However, the near ubiquity of the Netflix and the similar MovieLens datasets 1 may be narrowing the generality of lessons learned in this field. At GetJar, our goal is to make appealing recommendations of mobile applications (apps). For app usage, we observe a distribution that has higher kurtosis (heavier head and longer tail) than that for the aforementioned movie datasets. This happens primarily because of the large disparity in resources available to app developers and the low cost of app publication relative to movies.In this paper we compare a latent factor (PureSVD) and a memory-based model with our novel PCA-based model, which we call Eigenapp. We use both accuracy and variety as evaluation metrics. PureSVD did not perform well due to its reliance on explicit feedback such as ratings, which we do not have. Memory-based approaches that perform vector operations in the original high dimensional space overpredict popular apps because they fail to capture the neighborhood of less popular apps. They have high accuracy due to the concentration of mass in the head, but did poorly in terms of variety of apps exposed. Eigenapp, which exploits neighborhood information in low dimensional spaces, did well both on precision and variety, underscoring the importance of dimensionality reduction to form quality neighborhoods in high kurtosis distributions.
Automatic transfer of learned knowledge from one task or domain to another offers great potential to simplify and expedite the construction and deployment of intelligent systems. In practice however, there are many barriers to achieving this goal. In this article, we present a prototype system for the real-world context of transferring knowledge of American football from video observation to control in a game simulator. We trace an example play from the raw video through execution and adaptation in the simulator, highlighting the system's component algorithms along with issues of complexity, generality, and scale. We then conclude with a discussion of the implications of this work for other applications, along with several possible improvements.
We describe the TiVo television show collaborative recommendation system which has been fielded in over one million TiVo clients for four years. Over this install base, TiVo currently has approximately 100 million ratings by users over approximately 30,000 distinct TV shows and movies. TiVo uses an item-item (show to show) form of collaborative filtering which obviates the need to keep any persistent memory of each user s viewing preferences at the TiVo server. Taking advantage of TiVo s client-server architecture has produced a novel collaborative filtering system in which the server does a minimum of work and most work is delegated to the numerous clients. Nevertheless, the server-side processing is also highly scalable and parallelizable. Although we have not performed formal empirical evaluations of its accuracy, internal studies have shown its recommendations to be useful even for multiple user households. TiVo s architecture also allows for throttling of the server so if more server-side resources become available, more correlations can be computed on the server allowing TiVo to make recommendations for niche audiences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.