Xiaohan Yan scite author profile

Demanding sparsity in estimated models has become a routine practice in statistics. In many situations, we wish to require that the sparsity patterns attained honor certain problem-specific constraints. Hierarchical sparse modeling (HSM) refers to situations in which these constraints specify that one set of parameters be set to zero whenever another is set to zero. In recent years, numerous papers have developed convex regularizers for this form of sparsity structure, which arises in many areas of statistics including interaction modeling, time series analysis, and covariance estimation. In this paper, we observe that these methods fall into two frameworks, the group lasso (GL) and latent overlapping group lasso (LOG), which have not been systematically compared in the context of HSM. The purpose of this paper is to provide a side-by-side comparison of these two frameworks for HSM in terms of their statistical properties and computational efficiency. We call special attention to GL's more aggressive shrinkage of parameters deep in the hierarchy, a property not shared by LOG. In terms of computation, we introduce a finite-step algorithm that exactly solves the proximal operator of LOG for a certain simple HSM structure; we later exploit this to develop a novel path-based block coordinate descent scheme for general HSM structures. Both algorithms greatly improve the computational performance of LOG. Finally, we compare the two methods in the context of covariance estimation, where we introduce a new sparsely-banded estimator using LOG, which we show achieves the statistical advantages of an existing GL-based method but is simpler to express and more efficient to compute.Comment: 30 pages, 13 figure

show abstract

Proximity labeling proteomics reveals critical regulators for inner nuclear membrane protein degradation in plants

Huang

Tang

Shi

et al. 2020

Nat Commun

View full text Add to dashboard Cite

The inner nuclear membrane (INM) selectively accumulates proteins that are essential for nuclear functions; however, overaccumulation of INM proteins results in a range of rare genetic disorders. So far, little is known about how defective, mislocalized, or abnormally accumulated membrane proteins are actively removed from the INM, especially in plants and animals. Here, via analysis of a proximity-labeling proteomic profile of INM-associated proteins in Arabidopsis, we identify critical components for an INM protein degradation pathway. We show that this pathway relies on the CDC48 complex for INM protein extraction and 26S proteasome for subsequent protein degradation. Moreover, we show that CDC48 at the INM may be regulated by a subgroup of PUX proteins, which determine the substrate specificity or affect the ATPase activity of CDC48. These PUX proteins specifically associate with the nucleoskeleton underneath the INM and physically interact with CDC48 proteins to negatively regulate INM protein degradation in plants.

show abstract

Tree-aggregated predictive modeling of microbiome data

Bien

Yan

Simpson

et al. 2021

Sci Rep

View full text Add to dashboard Cite

Modern high-throughput sequencing technologies provide low-cost microbiome survey data across all habitats of life at unprecedented scale. At the most granular level, the primary data consist of sparse counts of amplicon sequence variants or operational taxonomic units that are associated with taxonomic and phylogenetic group information. In this contribution, we leverage the hierarchical structure of amplicon data and propose a data-driven and scalable tree-guided aggregation framework to associate microbial subcompositions with response variables of interest. The excess number of zero or low count measurements at the read level forces traditional microbiome data analysis workflows to remove rare sequencing variants or group them by a fixed taxonomic rank, such as genus or phylum, or by phylogenetic similarity. By contrast, our framework, which we call (ee-ggregation of ompositional data), learns data-adaptive taxon aggregation levels for predictive modeling, greatly reducing the need for user-defined aggregation in preprocessing while simultaneously integrating seamlessly into the compositional data analysis framework. We illustrate the versatility of our framework in the context of large-scale regression problems in human gut, soil, and marine microbial ecosystems. We posit that the inferred aggregation levels provide highly interpretable taxon groupings that can help microbiome researchers gain insights into the structure and functioning of the underlying ecosystem of interest.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xiaohan Yan

Rare Feature Selection in High Dimensions

Hierarchical Sparse Modeling: A Choice of Two Group Lasso Formulations

Proximity labeling proteomics reveals critical regulators for inner nuclear membrane protein degradation in plants

Tree-aggregated predictive modeling of microbiome data

Contact Info

Product

Resources

About