Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2001
DOI: 10.1145/502512.502526
|View full text |Cite
|
Sign up to set email alerts
|

Empirical bayes screening for multi-item associations

Abstract: This paper considers the franlework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. h~ our case, "unusually frequent" involves estimates of the frequency of each item set divided by a baseline frequency computed as if items occurred independently. The focus is on obtaining reliable estimates of this measure of interestingness for all item sets, even item sets with relatively low frequencies. For example, in a medical dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
190
1

Year Published

2005
2005
2019
2019

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 204 publications
(191 citation statements)
references
References 14 publications
(11 reference statements)
0
190
1
Order By: Relevance
“…For example, it would be very useful to interface algorithms which use statistical measures to find "interesting" itemsets (which are not necessarily frequent itemsets as used in an association rule context). Such algorithms include implementations of the χ 2 -test based algorithm by Silverstein, Brin, and Motwani (1998) or the baseline frequency approach by DuMouchel and Pregibon (2001).…”
Section: Discussionmentioning
confidence: 99%
“…For example, it would be very useful to interface algorithms which use statistical measures to find "interesting" itemsets (which are not necessarily frequent itemsets as used in an association rule context). Such algorithms include implementations of the χ 2 -test based algorithm by Silverstein, Brin, and Motwani (1998) or the baseline frequency approach by DuMouchel and Pregibon (2001).…”
Section: Discussionmentioning
confidence: 99%
“…MGPS is an innovative data-mining algorithm developed by William DuMouchel which introduces the concept of refining the relative reporting ratio calculation by using Bayesian statistics. [19][20][21] The basic assumption behind MGPS is that each observed count (N; drug/vaccine-event combination) is taken from a Poisson distribution with unknown mean (μ), with an interest center on the ratio λ which equals μ divided by E (the expected count estimated by assuming that the count of all reports for the specific drug/vaccine and the count of all reports for the specific event are independent; Table 1). The essential Bayesian contribution supposes that each λ is drawn from a common prior assumed to be a mixture of two g distributions.…”
Section: Methodsmentioning
confidence: 99%
“…The EB05 is a conservative measure that is supposed to minimize false positive signals. 20 Drug-event combinations with an EB05 ≥2.0 are frequently considered signals for drug-adverse events pairs based on the empiric threshold described by Szarfman et al 14 In addition, results of a simulation study 22 provided empirical evidence that using this threshold provides a degree of conservatism in exchange for the false signal rate in the data.…”
Section: Disproportionality Analyses For Vaccines' Safetymentioning
confidence: 99%
“…Traditional association-rule mining algorithms (Agrawal et al 1993;Bayardo and Agrawal 1999) are found to be yielding many spurious patterns (Brin et al 1997;Xiong et al 2006Xiong et al , 2008. As a result, in recent years, many statistical correlation measures, such as χ 2 statistics (Brin et al 1997;DuMouchel and Pregibon 2001;Jermaine 2001Jermaine , 2003), Pearson's correlation coefficients (Xiong et al 2004(Xiong et al , 2006Zhou and Xiong 2008), rank-based correlation coefficients (Melucci 2007;Yilmaz et al 2008), and mutual information (Ke et al 2006(Ke et al , 2007 have been considered in the setting of large-scale association analysis.…”
Section: Related Workmentioning
confidence: 99%
“…Brin et al 1997;Jermaine 2001;DuMouchel and Pregibon 2001;Jermaine 2003;Ilyas et al 2004;Xiong et al 2004Xiong et al , 2006, researchers and practitioners are still facing increasing challenges to measure associations among data objects produced by emerging data-intensive applications, particularly when the data are dynamic and analytical results need to be continually updated. Indeed, with such large and growing data sets, research efforts are needed to develop a dynamic solution for volatile correlation computing.…”
Section: Introductionmentioning
confidence: 99%