Statistical strategies for the analysis of massive data sets

Hwang, Hon; Ryan, Louise

doi:10.1002/bimj.201900034

Biometrical J

2019

DOI: 10.1002/bimj.201900034

|View full text |Cite

Statistical strategies for the analysis of massive data sets

Hon Hwang

Louise Ryan

Abstract: The advent of the big data age has changed the landscape for statisticians. Public and private organizations alike these days are interested in capturing and analyzing complex customer data in order to improve their service and drive efficiency gains. However, the large volume of data involved often means that standard statistical methods fail and new ways of thinking are needed. Although great gains can be obtained through the use of more advanced computing environments or through developing sophisticated new… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

2024

Publication Types

Select...

Article5

Relationship

Self Cite0

Independent5

Authors

Journals

Cited by 6 publications

References 25 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Markov regression model for analyzing big data to predict trajectories of repeated categorical outcomes: an application to $$\hbox {PM}_{2.5}$$ air pollution data

Chowdhury

Hasan

2021

Environ Ecol Stat

View full text Add to dashboard Cite

Fine particulate matter (PM 2.5 ), tiny particles in the air, is air contamination that negatively impacts the environment and human health when levels in the air are high. The elevated level of PM 2.5 also reduces visibility and causes the air to appear hazy. Due to its impact on environment and health, almost every country around the world keeps track of PM 2.5 air quality level and records the data repeatedly over time in many sites. As the data are collected repeatedly, there is likely to be a natural dependency among the repeated measures of PM 2.5 level in a specific site. Modeling and analyzing these repeated data will help policymakers recommend new policies and/or update existing policies. Thus adequate modeling of such data is of enormous interest among the researchers and policymakers. It is noteworthy that as the data are collected repeatedly in immense volume, big data modeling techniques are required for modeling such data. This paper proposed a new modeling framework to analyze and trajectory risk prediction of categorical responses from big data collected repeatedly. We developed a divide and recombine approach to analyzing big data gathered continually. We used the Markov model for data division, and the Markov chain is used to recombine the marginal and conditional probabilities and estimated joint probabilities for trajectory. We illustrated the proposed model using PM 2.5 outdoor air pollution data from the United States between the years 2000 to 2020. The performance of the proposed methodology is also checked through bootstrap simulation studies. The

show abstract

Markov regression model for analyzing big data to predict trajectories of repeated categorical outcomes: an application to $$\hbox {PM}_{2.5}$$ air pollution data

Chowdhury

Hasan

2021

Environ Ecol Stat

View full text Add to dashboard Cite

show abstract

Predictive Models for Trajectory Risks Prediction from Repeated Ordinal Outcomes

Chowdhury

Islam

2022

Bull. Malays. Math. Sci. Soc.

View full text Add to dashboard Cite

Divide and recombine approach for warranty database: estimating the reliability of an automobile component

Karim

2024

Data Science and Management

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Statistical strategies for the analysis of massive data sets

Cited by 6 publications

References 25 publications

Markov regression model for analyzing big data to predict trajectories of repeated categorical outcomes: an application to $$\hbox {PM}_{2.5}$$ air pollution data

Markov regression model for analyzing big data to predict trajectories of repeated categorical outcomes: an application to $$\hbox {PM}_{2.5}$$ air pollution data

Predictive Models for Trajectory Risks Prediction from Repeated Ordinal Outcomes

Divide and recombine approach for warranty database: estimating the reliability of an automobile component

Contact Info

Product

Resources

About