2022
DOI: 10.1007/s10618-022-00846-z
|View full text |Cite
|
Sign up to set email alerts
|

The minimum description length principle for pattern mining: a survey

Abstract: Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The Minimum Description Length (MDL) principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, we review MDL-based methods for mining different kinds of patterns from various ty… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(3 citation statements)
references
References 198 publications
0
2
0
Order By: Relevance
“…By processing new features, feature vectors are generated. The machine learning model is established after the feature vector is processed by the machine learning method, and the event category recognition of the test corpus is realized [15][16].…”
Section: Identification Of Event Categories Integrating Data and Know...mentioning
confidence: 99%
“…By processing new features, feature vectors are generated. The machine learning model is established after the feature vector is processed by the machine learning method, and the event category recognition of the test corpus is realized [15][16].…”
Section: Identification Of Event Categories Integrating Data and Know...mentioning
confidence: 99%
“…MDL has traditionally been used for model selection [40,18,15,3,34], but its intuitive appeal has led to applications in other areas such as pattern mining [11,23]. In supervised learning, MDL was used in NN as early as [22], in which the authors added Gaussian noise to the weights of the network to control their description length, and thus the amount of information required to communicate the NN.…”
Section: Related Workmentioning
confidence: 99%
“…As reporting all frequent patterns often results in overly large and highly redundant results, modern approaches instead focus on discovering patterns that are either significant with regard to some null-hypothesis [13,21,22] or discovering sets of patterns that together generalize the data well [7,27]. For the latter, the MDL criterion has been particularly successful [2,8,27]. Out of these approaches, we compare to Skopus [21], Sqs [27], Ism [7].…”
Section: Related Workmentioning
confidence: 99%