A practical guide to big data research in psychology.

Chen, Eric Evan; Wojcik, Sean P.

doi:10.1037/met0000111

Cited by 127 publications

(159 citation statements)

References 61 publications

(103 reference statements)

Supporting

Mentioning

145

Contrasting

Unclassified

Order By: Relevance

“…Classification involves identifying the category where an observation belongs, given known category labels 20. Logistic regression is an example of a classifier from statistics.…”

Section: Big Data Analysis Techniquesmentioning

confidence: 99%

A glossary for big data in population and public health: discussion and commentary on terminology and research methods

Fuller

Buote

Stanley

2017

J Epidemiol Community Health

View full text Add to dashboard Cite

The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods.

show abstract

“…Classification involves identifying the category where an observation belongs, given known category labels 20. Logistic regression is an example of a classifier from statistics.…”

Section: Big Data Analysis Techniquesmentioning

confidence: 99%

A glossary for big data in population and public health: discussion and commentary on terminology and research methods

Fuller

Buote

Stanley

2017

J Epidemiol Community Health

View full text Add to dashboard Cite

show abstract

“…Given that concealing negative emotions may be a particular concern among men (Nadeau et al, 2016), and given the relatively low participation of men in most psychology convenience samples, the possibility of oversampling male users may be a benefit rather than a limitation of Reddit analyses. Furthermore, the site's use of upvotes and downvotes (or "karma") tends to discourage most everyday users-that is, people not using dedicated "trolling" accounts-from behaving more antisocially than they would in real life (Barthel et al, 2016;Chen and Wojcik, 2016).…”

Section: Introductionmentioning

confidence: 99%

Within and Between-Person Differences in Language Used Across Anxiety Support and Neutral Reddit Communities

Ireland¹,

Iserman²

2018

Proceedings of the Fifth Workshop on Computational Linguistics And Clinical Psychology: From Keyboard to Clinic

View full text Add to dashboard Cite

Although many studies have distinguished between the social media language use of people who do and do not have a mental health condition, within-person context-sensitive comparisons (for example, analyzing individuals' language use when seeking support or discussing neutral topics) are less common. Two dictionary-based analyses of Reddit communities compared (1) anxious individuals' comments in anxiety support communities (e.g., /r/PanicParty) with the same users' comments in neutral communities (e.g., /r/todayilearned), and, (2) within popular neutral communities, comments by members of anxiety subreddits with comments by other users. Each comparison yielded theory-consistent effects as well as unexpected results that suggest novel hypotheses to be tested in the future. Results have relevance for improving researchers' and practitioners' ability to unobtrusively assess anxiety symptoms in conversations that are not explicitly about mental health.

show abstract

“…Within the field of IB in particular, researchers routinely deal with ''Big Data,'' that is, large amounts of information stored in archival datasets (Harlow & Oswald, 2016). Because these datasets were not collected directly in response to a particular research question, they contain many variables that can be restructured to produce ''favorable'' results (i.e., better fit estimates, larger effect-size estimates) (Chen & Wojcik, 2016). For example, consider the case of firm performance, which is one of the most frequently measured constructs in IB.…”

Section: Selection Of Variables To Include In a Modelmentioning

confidence: 99%

Science’s reproducibility and replicability crisis: International business is not immune

2017

View full text Add to dashboard Cite

International business is not immune to science's reproducibility and replicability crisis. We argue that this crisis is not entirely surprising given the methodological practices that enhance systematic capitalization on chance. This occurs when researchers search for a maximally predictive statistical model based on a particular dataset and engage in several trial-and-error steps that are rarely disclosed in published articles. We describe systematic capitalization on chance, distinguish it from unsystematic capitalization on chance, address five common practices that capitalize on chance, and offer actionable strategies to minimize the capitalization on chance and improve the reproducibility and replicability of future IB research.

show abstract

A practical guide to big data research in psychology.

Cited by 127 publications

References 61 publications

A glossary for big data in population and public health: discussion and commentary on terminology and research methods

A glossary for big data in population and public health: discussion and commentary on terminology and research methods

Within and Between-Person Differences in Language Used Across Anxiety Support and Neutral Reddit Communities

Science’s reproducibility and replicability crisis: International business is not immune

Contact Info

Product

Resources

About