This paper considers the problem of estimating a high-dimensional vector of parameters θ ∈ R n from a noisy observation. The noise vector is i.i.d. Gaussian with known variance. For a squared-error loss function, the James-Stein (JS) estimator is known to dominate the simple maximum-likelihood (ML) estimator when the dimension n exceeds two. The JSestimator shrinks the observed vector towards the origin, and the risk reduction over the ML-estimator is greatest for θ that lie close to the origin. JS-estimators can be generalized to shrink the data towards any target subspace. Such estimators also dominate the ML-estimator, but the risk reduction is significant only when θ lies close to the subspace. This leads to the question: in the absence of prior information about θ, how do we design estimators that give significant risk reduction over the MLestimator for a wide range of θ?In this paper, we propose shrinkage estimators that attempt to infer the structure of θ from the observed data in order to construct a good attracting subspace. In particular, the components of the observed vector are separated into clusters, and the elements in each cluster shrunk towards a common attractor. The number of clusters and the attractor for each cluster are determined from the observed vector. We provide concentration results for the squared-error loss and convergence results for the risk of the proposed estimators. The results show that the estimators give significant risk reduction over the ML-estimator for a wide range of θ, particularly for large n.Simulation results are provided to support the theoretical claims.An estimatorθ 1 is said to dominate another estimatorθ 2 if R(θ,θ 1 ) ≤ R(θ,θ 2 ), ∀θ ∈ R n , with the inequality being strict for at least one θ. Thus (4) implies that the James-Stein estimator (JS-estimator) dominates the ML-estimator. Unlike the ML-estimator, the JS-estimator is non-linear and biased. However, the risk reduction over the ML-estimator can be significant, making it an attractive option in many situations -see, for example, [3].By evaluating the expression in (3), it can be shown that the risk of the JS-estimator depends on θ only via θ [1]. Further, the risk decreases as θ decreases. (For intuition about this, note in (3) that for large n, y 2 ≈ nσ 2 + θ 2 .) The dependence of the risk on θ is illustrated in Fig. 1, where the average loss of the JS-estimator is plotted versus θ , for two different choices of θ.The JS-estimator in (2) shrinks each element of y towards the origin. Extending this idea, JS-like estimators can be defined by shrinking y towards any vector, or more generally, towards a target subspace V ⊂ R n . Let P V (y) denote the projection of y onto V, so that y−P V (y) 2 = min v∈V y−v 2 . Then the JS-estimator that shrinks y towards the subspace V arXiv:1602.00542v4 [cs.IT]