2018
DOI: 10.5334/egems.209
|View full text |Cite
|
Sign up to set email alerts
|

A Query Workflow Design to Perform Automatable Distributed Regression Analysis in Large Distributed Data Networks

Abstract: Introduction:Patient privacy and data security concerns often limit the feasibility of pooling patient-level data from multiple sources for analysis. Distributed data networks (DDNs) that employ privacy-protecting analytical methods, such as distributed regression analysis (DRA), can mitigate these concerns. However, DRA is not routinely implemented in large DDNs.Objective:We describe the design and implementation of a process framework and query workflow that allow automatable DRA in real-world DDNs that use … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2

Relationship

5
2

Authors

Journals

citations
Cited by 15 publications
(15 citation statements)
references
References 30 publications
(51 reference statements)
0
15
0
Order By: Relevance
“…To make distributed regression a more practical analytic option in practice, some researchers have developed statistical packages and stand-alone software to partially or fully automate the file transfer process. 44,45,[59][60][61] The amount of data processing and statistical analysis done at the data-contributing site vs. the analysis center also varies by analytic and data sharing option (Figure 4). In principle, standardizing the format and definition of the data elements across databases improves the ease of implementation of the analytic and data sharing options discussed in this article.…”
Section: Other Considerations Ease Of Implementationmentioning
confidence: 99%
“…To make distributed regression a more practical analytic option in practice, some researchers have developed statistical packages and stand-alone software to partially or fully automate the file transfer process. 44,45,[59][60][61] The amount of data processing and statistical analysis done at the data-contributing site vs. the analysis center also varies by analytic and data sharing option (Figure 4). In principle, standardizing the format and definition of the data elements across databases improves the ease of implementation of the analytic and data sharing options discussed in this article.…”
Section: Other Considerations Ease Of Implementationmentioning
confidence: 99%
“…Compared to the pooled individual-level analysis, however, our method requires multiple file transfers between the data partners and the analysis center. Although this need for information exchange at each iteration means that our proposed method is more labor-intensive to implement in practice, recent advancements in bioinformatics now allow semi-automated or fully-automated file transfers between data partners and the analysis center [17, 2127]. For general users who may not have access to such technical infrastructure, we have developed R code that allows manual implementation of our proposed method.…”
Section: Discussionmentioning
confidence: 99%
“…Manual exchanges of this summary-level information can be too tedious and labor-intensive to be practical in actual multicenter studies. However, there are a number of statistical packages and standalone software that enable researchers to perform distributed regression and partially or fully automate the file transfer process 19,20,3133. We chose not to assess the operational performance (eg, runtime) of our distributed regression analysis because it is highly network-dependent.…”
Section: Discussionmentioning
confidence: 99%