2016
DOI: 10.4310/sii.2016.v9.n4.a1
|View full text |Cite
|
Sign up to set email alerts
|

Statistical methods and computing for big data

Abstract: Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
73
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 98 publications
(73 citation statements)
references
References 54 publications
(90 reference statements)
0
73
0
Order By: Relevance
“…Online updating is a useful strategy for analyzing large‐scale data and streaming data, and recently, stochastic gradient decent has become a popular method for doing online updating (Wang, Chen, Schifano, Wu, & Yan, ). Furthermore, it has been shown that implicit SGD is more stable than explicit SGD.…”
Section: Discussionmentioning
confidence: 99%
“…Online updating is a useful strategy for analyzing large‐scale data and streaming data, and recently, stochastic gradient decent has become a popular method for doing online updating (Wang, Chen, Schifano, Wu, & Yan, ). Furthermore, it has been shown that implicit SGD is more stable than explicit SGD.…”
Section: Discussionmentioning
confidence: 99%
“…In applications where the data are too large to analyze all at once, it has been proposed to use divide and recombine approaches, in which we divide the data into a number of smaller samples, analyze these subsamples individually, and then combine the results of these individual analyses …”
Section: Methodsmentioning
confidence: 99%
“…Recently, new statistical and/or computational methodologies have been proposed that scale problems to a reasonable size . These include the “divide and conquer” approach, where the data are divided into subsamples, the subsamples are analyzed in parallel and the results are then combined across subsamples .…”
Section: Introductionmentioning
confidence: 99%
“…To work around this computational barrier, there are 3 immediate options: (1) increasing computing power, ie, a high-performance computing cluster, (2) modifying the fitting algorithm to improve computational performance, and (3) subsampling large data to a manageable and workable size. 1 Each option has both advantages and disadvantages, and depending on the modelling application, a combination of all 3 options might be the best approach. In this work we focus on option 3, namely, subsampling a large dataset into a manageable size.…”
Section: Introductionmentioning
confidence: 99%