Donna Xu scite author profile

The aim of multi-output learning is to simultaneously predict multiple outputs given an input. It is an important learning problem for decision-making, since making decisions in the real world often involves multiple complex factors and criteria. In recent times, an increasing number of research studies have focused on ways to predict multiple outputs at once. Such efforts have transpired in different forms according to the particular multi-output learning problem under study. Classic cases of multi-output learning include multi-label learning, multidimensional learning, multi-target regression and others. From our survey of the topic, we were struck by a lack in studies that generalize the different forms of multi-output learning into a common framework. This paper fills that gap with a comprehensive review and analysis of the multi-output learning paradigm. In particular, we characterize the 4 Vs of multi-output learning, i.e., volume, velocity, variety, and veracity, and the ways in which the 4 Vs both benefit and bring challenges to multioutput learning by taking inspiration from big data. We analyze the life cycle of output labeling, present the main mathematical definitions of multi-output learning, and examine the field's key challenges and corresponding solutions as found in the literature. Several model evaluation metrics and popular data repositories are also discussed. Last but not least, we highlight some emerging challenges with multi-output learning from the perspective of the 4 Vs as potential research directions worthy of further studies.

show abstract

Metric Learning for Multi-Output Tasks

Liu

Tsang

et al. 2019

IEEE Trans. Pattern Anal. Mach. Intell.

View full text Add to dashboard Cite

Multi-output learning with the task of simultaneously predicting multiple outputs for an input has increasingly attracted interest from researchers due to its wide application. The k nearest neighbor (kNN) algorithm is one of the most popular frameworks for handling multi-output problems. The performance of kNN depends crucially on the metric used to compute the distance between different instances. However, our experiment results show that the existing advanced metric learning technique cannot provide an appropriate distance metric for multi-output tasks. This paper systematically studies how to learn an appropriate distance metric for multi-output problems. In particular, we present a novel large margin metric learning paradigm for multi-output tasks, which projects both the input and output into the same embedding space and then learns a distance metric to discover output dependency such that instances with very different multiple outputs will be moved far away. Several strategies are then proposed to speed up the training and testing time. Moreover, we study the generalization error bound of our method, which shows that our method is able to tighten the excess risk bounds. Experiments on three multi-output learning tasks (multi-label classification, multi-target regression, and multi-concept retrieval) validate the effectiveness and scalability of the proposed method.

show abstract

Achieving Reliable High-Frequency Releases in Cloud Environments

Zhu

Tran

et al. 2015

IEEE Softw.

View full text Add to dashboard Cite

Online Product Quantization

Tsang

Zhang

2018

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Approximate nearest neighbor (ANN) search has achieved great success in many tasks. However, existing popular methods for ANN search, such as hashing and quantization methods, are designed for static databases only. They cannot handle well the database with data distribution evolving dynamically, due to the high computational effort for retraining the model based on the new database. In this paper, we address the problem by developing an online product quantization (online PQ) model and incrementally updating the quantization codebook that accommodates to the incoming streaming data. Moreover, to further alleviate the issue of large scale computation for the online PQ update, we design two budget constraints for the model to update partial PQ codebook instead of all. We derive a loss bound which guarantees the performance of our online PQ model. Furthermore, we develop an online PQ model over a sliding window with both data insertion and deletion supported, to reflect the real-time behaviour of the data. The experiments demonstrate that our online PQ model is both time-efficient and effective for ANN search in dynamic large scale databases compared with baseline methods and the idea of partial PQ codebook update further reduces the update cost. Index Terms-Online indexing model, product quantization, nearest neighbour search.!

show abstract

Making Real Time Data Analytics Available as a Service

et al. 2015

View full text Add to dashboard Cite

Conducting (big) data analytics in an organization is not just about using a processing framework (e.g. Hadoop/Spark) to learn a model from data currently in a single file system (e.g. HDFS). We frequently need to pipeline real time data from other systems into the processing framework, and continually update the learned model. The processing frameworks need to be easily invokable for different purposes to produce different models. The model and the subsequent model updates need to be integrated with a product that may require a real time prediction using the latest trained model. All these need to be shared among different teams in the organization for different data analytics purposes. In this paper, we propose a real time data-analytics-as-service architecture that uses RESTful web services to wrap and integrate data services, dynamic model training services (supported by big data processing framework), prediction services and the product that uses the models. We discuss the challenges in wrapping big data processing frameworks as services and other architecturally significant factors that affect system reliability, real time performance and prediction accuracy. We evaluate our architecture using a log-driven system operation anomaly detection system where staleness of data used in model training, speed of model update and prediction are critical requirements.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Donna Xu

Survey on Multi-Output Learning

Metric Learning for Multi-Output Tasks

Achieving Reliable High-Frequency Releases in Cloud Environments

Online Product Quantization

Making Real Time Data Analytics Available as a Service

Contact Info

Product

Resources

About