Learning probabilistic graphical models from high-dimensional datasets is a computationally challenging task. In many interesting applications, the domain dimensionality is such as to prevent state-of-the-art statistical learning techniques from delivering accurate models in reasonable time. This paper presents a hybrid random field model for pseudo-likelihood estimation in high-dimensional domains. A theoretical analysis proves that the class of pseudo-likelihood distributions representable by hybrid random fields strictly includes the class of joint probability distributions representable by Bayesian networks. In order to learn hybrid random fields from data, we develop the Markov Blanket Merging algorithm. Theoretical and experimental evidence shows that Markov Blanket Merging scales up very well to high-dimensional datasets. As compared to other widely used statistical learning techniques, Markov Blanket Merging delivers accurate results in a number of link prediction tasks, while achieving also significant improvements in terms of computational efficiency. Our software implementation of the models investigated in this paper is publicly available at http://www.dii.unisi.it/~freno/. The same website also hosts the datasets used in this work that are not available elsewhere in the same preprocessing used for our experiments
Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommendations by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the naïve applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers' preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equivalently maximize the data throughput), which in turn may prevent us from achieving the accuracy necessary in webscale industrial applications. In this paper, we investigate how the practical challenges faced in this setting can be tackled via an online learning to rank approach. Sparse models will be the key to reduce prediction latency, whereas onepass stochastic optimization will minimize the training time and restrict the memory footprint. Interestingly, and perhaps surprisingly, extensive experiments show that one-pass learning preserves most of the predictive performance. Additionally, we study a variety of online learning algorithms that enforce sparsity and provide insights to help the practitioner make an informed decision about which approach to pick. We report results on a massive purchase log dataset from the Amazon retail website, as well as on several benchmarks from the LETOR corpus.
A Mario Freno, per avermi insegnato la differenza tra essere onesti ed essere fessi; e a Maria Scaramozzino, per avermi insegnato ad apprezzarla.-nino Alla mia famiglia Edmondo ForewordThe field of graphical models has been growing significantly since the pioneering studies on probabilistic reasoning in intelligent systems. Beginning from Bayesian networks and Markov random fields, this book offers a new perspective deriving from their nice integration, which results in the new framework of hybrid random fields. While reading the book, one early realizes that there is a unifying approach, that seems to be motivated by the question on how existing types of probabilistic graphical models can be properly combined in such a way to obtain model classes that are rich enough to express a wide variety of conditional independence structures. The authors provide evidence that this combination exhibits scalable behavior in parameter and structure learning. This is also supported by massive experimental results to support the claims on the performance of hybrid random fields in different topics, ranging from bioinformatics to information retrieval. The clean integration idea behind hybrid random fields gives rise to the distinguishing feature of the model, namely the dramatic reduction of complexity that opens the doors to large scale real-world problems. The book not only marks an effective direction of investigation with significant experimental advances, but it is also-and perhaps primarily-a guide for the reader through an original trip in the space of probabilistic modeling. Interestingly, even though the main subject of investigation is quite specific, while digesting the book, one is enriched with a very open view of the field, with full of stimulating connections. The reader finds a vivid presentation, well rooted in the literature, with inspiring historical references. I very much like the philosophical framework to embed scientific issues given at the end of the book. The authors clear the ground for a view of AI as a science that investigates cognitive technologies. While their investigation does not necessarily provide evidence on natural cognition, they offer indeed a number of intriguing insights. Statistical learning methods are labeled as cognitive technologies, rather than cognitive models. Machines equipped with these methods are viewed as tools for extending human cognition to novel domains, thus offering a nice perspective on the somehow ill-posed question on whether or not machines are intelligent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.