2020
DOI: 10.1371/journal.pgen.1009141
|View full text |Cite
|
Sign up to set email alerts
|

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank

Abstract: The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

2
172
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 88 publications
(175 citation statements)
references
References 45 publications
2
172
1
Order By: Relevance
“…are accelerated though multi-threading on CPUs with multiple cores. The first solver implements the iteratively-reweighted least square algorithm in glmnet, and its goal is to provide boosted computational and memory performance to the large-scale Lasso solver described in (Qian et al 2020, Li et al 2020. The second solver implements an accelerated proximal gradient method that's able to solve more general regularized regression problems.…”
Section: Discussionmentioning
confidence: 99%
“…are accelerated though multi-threading on CPUs with multiple cores. The first solver implements the iteratively-reweighted least square algorithm in glmnet, and its goal is to provide boosted computational and memory performance to the large-scale Lasso solver described in (Qian et al 2020, Li et al 2020. The second solver implements an accelerated proximal gradient method that's able to solve more general regularized regression problems.…”
Section: Discussionmentioning
confidence: 99%
“…Third, weights for PRS were calculated using GWAS summary results (thresholding and pruning method) whereas PTRS weights were calculated using individual level data due to computational considerations. Future analysis will be performed using individual level data for PRS by using biobank-scale ready elastic net approaches such as (Qian et al, 2020). Fourth, higher quality prediction models of the transcriptome in non-European ancestries are limited.…”
Section: Discussionmentioning
confidence: 99%
“…One strategy used to run penalized regressions on such large datasets such as the UK Biobank (Bycroft et al 2018) has been to apply a variable pre-selection step before fitting the lasso (Lello et al 2018). Recently, authors of the glmnet package have developed a new R package, snpnet, to fit penalized regressions on the UK Biobank without having to perform any pre-filtering (Qian et al 2020). Earlier, we developed two R packages for efficiently analyzing large-scale (genetic) data, namely bigstatsr and bigsnpr (Privé et al 2018).…”
Section: Introductionmentioning
confidence: 99%
“…We then specifically derived a highly efficient implementation of penalized linear and logistic regressions in R package bigstatsr, and showed how these functions were useful for genetic prediction with some applications to the UK Biobank (Privé et al 2019). Here we would like to come back to some statements made in (Qian et al 2020) and benchmark bigstatsr against snpnet for fitting penalized regressions on large genetic data. We re-investigate the similarities and differences between the penalized regression implementations of packages snpnet and bigstatsr.…”
Section: Introductionmentioning
confidence: 99%