2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) 2016
DOI: 10.1109/dsaa.2016.33
|View full text |Cite
|
Sign up to set email alerts
|

Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing

Abstract: A/B testing is one of the most successful applications of statistical theory in modern Internet age. One problem of Null Hypothesis Statistical Testing (NHST), the backbone of A/B testing methodology, is that experimenters are not allowed to continuously monitor the result and make decision in real time. Many people see this restriction as a setback against the trend in the technology toward real time data analytics. Recently, Bayesian Hypothesis Testing, which intuitively is more suitable for real time decisi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
36
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 59 publications
(36 citation statements)
references
References 24 publications
0
36
0
Order By: Relevance
“…Accordingly, this is one of the first advices that testing experts give in their papers and online blogs (Kohavi et al, 2007(Kohavi et al, , 2014Dahl and Mumford, 2015;Dmitriev et al, 2017;Emily Robinson, 2018). Reaching a specific minimum sample size before being able to obtain any result is one of the requirements of the Null Hypothesis Statistical Testing (NHST); nonetheless, this pitfall could turn irrelevant by changing to another statistical interpretation of the results, such as using Bayesian Hypothesis Testing or Sequential Hypothesis Testing (Deng et al, 2016;Johari et al, 2017;Su and Yohai, 2019) which have been attracting research interest as alternatives to NHST and are already used by some commercial testing tools [e.g., VWO and AB Tasty are based on Bayesian calculations (Stucchio, 2015;Wassner and Brebion, 2018) and Optimizely uses Sequential hypothesis testing (Rusonis and Ren, 2018)]. However, both the performed observations, interviews and previous authors report that frequentist approaches are still the most commonly used for A/B testing (e.g., Kohavi et al, 2007;Emily Robinson, 2018).…”
Section: Determination Of the Experiments Lengthmentioning
confidence: 99%
See 1 more Smart Citation
“…Accordingly, this is one of the first advices that testing experts give in their papers and online blogs (Kohavi et al, 2007(Kohavi et al, , 2014Dahl and Mumford, 2015;Dmitriev et al, 2017;Emily Robinson, 2018). Reaching a specific minimum sample size before being able to obtain any result is one of the requirements of the Null Hypothesis Statistical Testing (NHST); nonetheless, this pitfall could turn irrelevant by changing to another statistical interpretation of the results, such as using Bayesian Hypothesis Testing or Sequential Hypothesis Testing (Deng et al, 2016;Johari et al, 2017;Su and Yohai, 2019) which have been attracting research interest as alternatives to NHST and are already used by some commercial testing tools [e.g., VWO and AB Tasty are based on Bayesian calculations (Stucchio, 2015;Wassner and Brebion, 2018) and Optimizely uses Sequential hypothesis testing (Rusonis and Ren, 2018)]. However, both the performed observations, interviews and previous authors report that frequentist approaches are still the most commonly used for A/B testing (e.g., Kohavi et al, 2007;Emily Robinson, 2018).…”
Section: Determination Of the Experiments Lengthmentioning
confidence: 99%
“…With the rise of software and internet connectivity, A/B testing presents an unprecedented opportunity to make causal conclusions between the changes made and the customers' reaction on them in near real time (Fabijan et al, 2016). Big players [e.g., Amazon (Dmitriev et al, 2016), Facebook (Bakshy et al, 2014), Google (Hohnhold et al, 2015), Netflix (Amatriain and Basilico, 2012), or Uber (Deb et al, 2018)] as well as smaller companies have been using A/B testing as a scientifically grounded way to evaluate changes and comparing different alternatives (Deng et al, 2016). And, in the last years, the rapid rise of A/B testing has led to the emergence of multiple commercial testing platforms able to handle the implementation of these experiments (Dmitriev et al, 2017;Johari et al, 2017) that, according to the survey results presented in Fabijan et al (2018b), are used by ∼25% of web experimenters.…”
Section: Introductionmentioning
confidence: 99%
“…Third, Bayes factors allow one to accumulate data until enough evidence has been acquired for one of the competing hypotheses, compared to the other one (Cornfield, 1966; Deng, Lu, & Chen, 2016; Schönbrodt, Wagenmakers, Zehetleitner, & Perugini, in press). Indeed, in Bayesian analysis, more evidence usually increases support for one of the competing hypotheses, and not necessarily to a change in the direction of the results (see Schönbrodt et al, in press).…”
Section: Bayesian Hypothesis Testing For Threat Conditioning Datamentioning
confidence: 99%
“…Bayarri, Benjamin, Berger, and Sellke (2016) pointed out that NHST has been 'overly relied on' by the scientific community, and Cumming (2014) stressed the 'need to shift from reliance on NHST to estimation and other preferred techniques'. Moreover, to replace the NHST framework, several researchers and practitioners (e.g., Berger, Boukai, & Wang, 1997;Deng, 2015;Deng, Lu, & Chen 2016;Johnson, 2013b;Kass & Raftery, 1995;Kruschke, 2013;Rouder, Speckman, Sun, Morey, & Iverson, 2009) have proposed several alternative frameworks, most of which are Bayesian in nature. Among the advocates, Gigerenzer and Swijtink (1990) praised NHST as the 'essential backbone of scientific reasoning'.…”
Section: Introductionmentioning
confidence: 99%