A Dirty Model for Multiple Sparse Regression

Jalali, Ali; Ravikumar, Pradeep; Sanghavi, Sujay

doi:10.1109/tit.2013.2280272

Cited by 64 publications

(78 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For example, [13][14][15][34][35][36] assume that there are a set of features (either in original space or in a transformed space) sharing for all tasks. There are also some multi-task learning algorithms using sparse constraints, such as 1 norm constraint [11] , 2, 1 norm constraint [37] , trace norm constraint [15,36] , and the combination of them such as 1 + 1 ,q norm multi-task learning [16] , sparse and low-rank multi-task learning [13] , robust multi-task learning using group sparse and low rank constraints [38] , robust multi-task feature learning [39] .…”

Section: Related Workmentioning

confidence: 99%

“…the user attribute concurrently, which can improve the generalization performance than standard multi-class classification problem. Most state-of-the-art supervised multi-task learning methods only adopt the limited labeled Y for model training, e.g., [10][11][12][13][14][15][16] . As we know, the unlabeled data, i.e., the missing labels data, also include useful information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

User attribute discovery with missing labels

Cong

Sun

Liu

et al. 2018

Pattern Recognition

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

User attribute discovery with missing labels

Cong

Sun

Liu

et al. 2018

Pattern Recognition

View full text Add to dashboard Cite

“…They then remove the outlier tasks and perform L 21 norm multi-task learning on the clean dataset composed of similar tasks. In Jalali et al 's work [34], the sum of two matrices are used to represent the parameters and these are regularized differently to learn both shared features and individual outliers for different tasks separately.…”

Section: Multi-task Learningmentioning

confidence: 99%

“…Jalali et al, in [34], propose to decompose the model W into two components as P and Q, where one captures shared features among tasks while the other captures intrinsic properties that are useful in recognizing individual tasks.…”

Section: Dirty Multi-task Lassomentioning

confidence: 99%

Multi-domain and multi-task prediction of extraversion and leadership from meeting videos

Kındıroğlu

Akarun

Aran

2017

J Image Video Proc.

View full text Add to dashboard Cite

Automatic prediction of personalities from meeting videos is a classical machine learning problem. Psychologists define personality traits as uncorrelated long-term characteristics of human beings. However, human annotations of personality traits introduce cultural and cognitive bias. In this study, we present methods to automatically predict emergent leadership and personality traits in the group meeting videos of the Emergent LEAdership corpus. Prediction of extraversion has attracted the attention of psychologists as it is able to explain a wide range of behaviors, predict performance, and assess risk. Prediction of emergent leadership, on the other hand, is of great importance for the business community. Therefore, we focus on the prediction of extraversion and leadership since these traits are also strongly manifested in a meeting scenario through the extracted features. We use feature analysis and multi-task learning methods in conjunction with the non-verbal features and crowd-sourced annotations from the Video bLOG (VLOG) corpus to perform a multi-domain and multi-task prediction of personality traits. Our results indicate that multi-task learning methods using 10 personality annotations as tasks and with a transfer from two different datasets from different domains improve the overall recognition performance. Preventing negative transfer by using a forward task selection scheme yields the best recognition results with 74.5% accuracy in leadership and 81.3% accuracy in extraversion traits. These results demonstrate the presence of annotation bias as well as the benefit of transferring information from weakly similar domains.

show abstract

“…Of these, the most directly related to our work are [40] and [27], which formulate the general problem of estimation in settings where the signal may be split into a superposition of different types θ = L =1 θ through the use of penalization of different types for each of the components of the superposition of the form L =1 penalty (θ ). Within the general framework, [40] and [27] proceed to focus their study on several leading cases in sparse estimation, emphasizing the interplay between group-wise sparsity and element-wise sparsity and considering problems in multi-task learning. By contrast, we propose and focus on another leading case, which emphasizes the interplay between sparsity and density in the context of regression learning.…”

mentioning

confidence: 99%

A lava attack on the recovery of sums of dense and sparse signals

Chernozhukov¹,

Hansen²,

Liao³

2015

View full text Add to dashboard Cite

Common high-dimensional methods for prediction rely on having either a sparse signal model, a model in which most parameters are zero and there are a small number of non-zero parameters that are large in magnitude, or a dense signal model, a model with no large parameters and very many small non-zero parameters. We consider a generalization of these two basic models, termed here a "sparse+dense" model, in which the signal is given by the sum of a sparse signal and a dense signal. Such a structure poses problems for traditional sparse estimators, such as the lasso, and for traditional dense estimation methods, such as ridge estimation. We propose a new penalization-based method, called lava, which is computationally efficient. With suitable choices of penalty parameters, the proposed method strictly dominates both lasso and ridge. We derive analytic expressions for the finite-sample risk function of the lava estimator in the Gaussian sequence model. We also provide a deviation bound for the prediction risk in the Gaussian regression model with fixed design. In both cases, we provide Stein's unbiased estimator for lava's prediction risk. A simulation example compares the performance of lava to lasso, ridge, and elastic net in a regression example using data-dependent penalty parameters and illustrates lava's improved performance relative to these benchmarks.1. Introduction. Many recently proposed high-dimensional modeling techniques build upon the fundamental assumption of sparsity. Under sparsity, we can approximate a high-dimensional signal or parameter by a sparse vector that has a relatively small number of non-zero components. Various 1 -based penalization methods, such as the lasso and soft-thresholding, have been proposed for signal recovery, prediction, and parameter estimation within a sparse signal framwork. [29], and others. By virtue of being based on 1 -penalized optimization problems, these methods produce sparse solutions in which many estimated model parameters are set exactly to zero.Another commonly used shrinkage method is ridge estimation. Ridge estimation differs from the aforementioned 1 -penalized approaches in that it does not produce a sparse solution but instead provides a solution in which all model parameters are estimated to be non-zero. Ridge estimation is thus suitable when the model's parameters or unknown signals contain many very small components, i.e. when the model is dense. See, e.g., [25].

show abstract

A Dirty Model for Multiple Sparse Regression

Cited by 64 publications

References 21 publications

User attribute discovery with missing labels

User attribute discovery with missing labels

Multi-domain and multi-task prediction of extraversion and leadership from meeting videos

A lava attack on the recovery of sums of dense and sparse signals

Contact Info

Product

Resources

About