Robust variable selection and distributed inference using τ-based estimators for large-scale data

Emadaldin, Mozafari-Majd,; Koivunen, Visa

doi:10.23919/eusipco47968.2020.9287773

Cited by 2 publications

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Healthcare

Lemyre,

Lévesque,

Domingue

et al. 2023

Preprint

View full text Add to dashboard Cite

Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for healthcare research frameworks. This paper aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data; (2) describing the methods applicable to generalized linear models (GLM) and assessing their underlying distributional assumptions; (3) adapting existing methods to make them fully usable in healthcare research. A scoping review methodology was employed for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in healthcare research. From the review, 41 articles were selected, and six approaches were extracted for conducting standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information-sharing requirements and operational complexity.

show abstract

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Healthcare

Lemyre,

Lévesque,

Domingue

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics (Preprint)

Camirand Lemyre,

Lévesque,

Domingue

et al. 2023

JMIR Medical Informatics

View full text Add to dashboard Cite

Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for healthcare research frameworks.Objective: This paper aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data; (2) describing the methods applicable to generalized linear models (GLM) and assessing their underlying distributional assumptions; (3) adapting existing methods to make them fully usable in healthcare research.Methods: A scoping review methodology was employed for the literature mapping using interdisciplinary databases: MEDLINE, Scopus, MathSciNet and zbMATH. Included articles had to provide a methodological contribution and address inferential statistics on horizontally partitioned data. Model type, methodological setting and number of communications required between data nodes and the coordinating centre were among the data extracted. The methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in healthcare research. Statistical theory was used to adapt methods and derive the properties of the resulting estimators.Results: To complete objective (1), 41 articles were selected from the review process. Most of the included articles provide a solution related to parametric regression, and four communication schemes were identified among the presented methods. Most articles present a methodology that does not require any communication from the coordinating centre to data nodes. For objective (2), six approaches were extracted for conducting standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. For objective (3), statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Detailed algorithms were developed to highlight information-sharing requirements and operational complexity. Two approaches did not require any exchanged quantities from the coordinating centre to data nodes, while others needed sharing aggregated estimates and/or average of local gradients and Hessians.Conclusions: This paper contributes to the field of healthcare research by providing an overview of the methods that can be used with horizontally partitioned data, by adapting these methods to the context of heterogeneous health data and by clarifying the workflows and quantities exchanged by the methods discussed. Documenting these adapted methods contributes to making the field of distributed data analytics more accessible to data custodians and health researchers, especially in the context of inference, which is not the focus of most of the existing HPSA literature...

show abstract

Robust variable selection and distributed inference using τ-based estimators for large-scale data

Cited by 2 publications

References 15 publications

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Healthcare

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Healthcare

Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics (Preprint)

Contact Info

Product

Resources

About