Nora Yujia Payne scite author profile

Nora Yujia Payne

2Publications

0Citation Statements Received

56Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Michigan–Ann Arbor

Publications

Order By: Most citations

Separating and reintegrating latent variables to improve classification of genomic data

Payne¹,

Gagnon-Bartsch²

2020

Preprint

View full text Add to dashboard Cite

Genomic datasets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes) and thus give rise to dense latent variation, which presents both challenges and opportunities for classification. Some of these latent variables may be partially correlated with the phenotype of interest and therefore helpful, while others may be uncorrelated and thus merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. We propose the cross-residualization classifier to better account for the latent variables in genomic data. Through an adjustment and ensemble procedure, the cross-residualization classifier essentially estimates the latent variables and residualizes out their effects, trains a classifier on the residuals, and then re-integrates the the latent variables in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information that they may contribute. We apply the method to simulated data as well as a variety of genomic datasets from multiple platforms. In general, we find that the cross-residualization classifier performs well relative to existing classifiers and sometimes offers substantial gains.

show abstract

Separating and reintegrating latent variables to improve classification of genomic data

Payne

Gagnon-Bartsch

2022

View full text Add to dashboard Cite

Summary Genomic data sets contain the effects of various unobserved biological variables in addition to the variable of primary interest. These latent variables often affect a large number of features (e.g., genes), giving rise to dense latent variation. This latent variation presents both challenges and opportunities for classification. While some of these latent variables may be partially correlated with the phenotype of interest and thus helpful, others may be uncorrelated and merely contribute additional noise. Moreover, whether potentially helpful or not, these latent variables may obscure weaker effects that impact only a small number of features but more directly capture the signal of primary interest. To address these challenges, we propose the cross-residualization classifier (CRC). Through an adjustment and ensemble procedure, the CRC estimates and residualizes out the latent variation, trains a classifier on the residuals, and then reintegrates the latent variation in a final ensemble classifier. Thus, the latent variables are accounted for without discarding any potentially predictive information. We apply the method to simulated data and a variety of genomic data sets from multiple platforms. In general, we find that the CRC performs well relative to existing classifiers and sometimes offers substantial gains.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nora Yujia Payne

Separating and reintegrating latent variables to improve classification of genomic data

Separating and reintegrating latent variables to improve classification of genomic data

Contact Info

Product

Resources

About