2001
DOI: 10.1007/pl00011664
|View full text |Cite
|
Sign up to set email alerts
|

Multipass Algorithms for Mining Association Rules in Text Databases

Abstract: In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2003
2003
2007
2007

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 38 publications
(13 citation statements)
references
References 8 publications
0
13
0
Order By: Relevance
“…Each processor can use the global hash table to determine if a candidate might have a sufficient support count, and also to determine which processors should be polled to obtain the support count of the candidate. This approach is unlikely to be efficient for text databases as the underlying DHP hash table was shown to be ineffective in mining text databases [20].…”
Section: Parallel Algorithmsmentioning
confidence: 99%
See 2 more Smart Citations
“…Each processor can use the global hash table to determine if a candidate might have a sufficient support count, and also to determine which processors should be polled to obtain the support count of the candidate. This approach is unlikely to be efficient for text databases as the underlying DHP hash table was shown to be ineffective in mining text databases [20].…”
Section: Parallel Algorithmsmentioning
confidence: 99%
“…In addition, the whole set of candidates needs to fit in the memory of each processor for efficient counting, whereas the number of candidates is usually very large for text databases. The Multipass approach proposed in [20] can be used to control the number of candidates to be processed at the same time.…”
Section: Parallel Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…Data mining has been used in Web text mining, which refers to the process of searching through unstructured data on the Web and deriving meaning from it [6] [8]. One of main purposes of text mining is association discovery [2].…”
Section: Introductionmentioning
confidence: 99%
“…Association mining has been used in Web text mining, which refers to the process of searching through unstructured data on the Web and deriving meanings from it [8] [11]. The main purposes of text mining include association discovery, trends discovery, and event discovery [5].…”
Section: Introductionmentioning
confidence: 99%