Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2012
DOI: 10.1145/2339530.2339653
|View full text |Cite
|
Sign up to set email alerts
|

Trustworthy online controlled experiments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
68
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 184 publications
(71 citation statements)
references
References 5 publications
1
68
0
Order By: Relevance
“…Data that is excellent but incomplete may provide insights under one marketing channel but would fail to inform the retailer of the total effect of a marketing action. Finally, data that is big data but does not contain exogenous sources of variation can be misleading to the retailer and suggests why experimental methods (A/B tests, e.g., Kohavi et al 2012) and/or instrumental variables methods (Conley et al 2008) have become popular tools to "learn from data". Next, we describe more relevant data.…”
Section: Big Data Versus Better Data and "Better" Modelsmentioning
confidence: 99%
“…Data that is excellent but incomplete may provide insights under one marketing channel but would fail to inform the retailer of the total effect of a marketing action. Finally, data that is big data but does not contain exogenous sources of variation can be misleading to the retailer and suggests why experimental methods (A/B tests, e.g., Kohavi et al 2012) and/or instrumental variables methods (Conley et al 2008) have become popular tools to "learn from data". Next, we describe more relevant data.…”
Section: Big Data Versus Better Data and "Better" Modelsmentioning
confidence: 99%
“…For example, "Profit" is not a good OEC, as shortterm theatrics (e.g., raising prices) can increase short-term profit, but hurt it in the long run. As we showed in Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained [25], market share can be a long-term goal, but it is a terrible short-term criterion: making a search engine worse forces people to issue more queries to find an answer, but, like hiking prices, users will find better alternatives long-term. Sessions per user, or repeat visits, is a much better factor in the OEC, and one that we use at Bing.…”
Section: Tenetmentioning
confidence: 96%
“…To address the multiple outcomes issue, we standardized our success criteria to use a small set of metrics, such as sessions/user [25].…”
Section: False Positivesmentioning
confidence: 99%
“…Namely, an experimental platform usually has a standardized success criteria, which utilizes a small set of key metrics to make the final decision on the treatment variant of the service [15]. These metrics are usually selected with respect to some business-related criteria of a considered service and are aligned with its long-term goals (like the number of sessions per user for a search engine [14]). Hence, finding an alternative for them is non-trivial and challenging [4], that is why a modification of an existing standardized metric is preferred.…”
Section: Introductionmentioning
confidence: 99%
“…User engagement reflects how often the user solves her needs (e.g., to search something) by means of the considered service (e.g., a search engine). On the one hand, these metrics are measurable in the short-term experiment period, and, on the other hand, they are predictive of the long-term success of the company [14,15,16,25]. That is why engagement metrics are often considered to be most appropriate for online evaluation.…”
Section: Introductionmentioning
confidence: 99%