The minimum description length principle for pattern mining: a survey

Galbrun, Esther

doi:10.1007/s10618-022-00846-z

Cited by 10 publications

(3 citation statements)

References 198 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By processing new features, feature vectors are generated. The machine learning model is established after the feature vector is processed by the machine learning method, and the event category recognition of the test corpus is realized [15][16].…”

Section: Identification Of Event Categories Integrating Data and Know...mentioning

confidence: 99%

Practice and Application of Fusion Machine Learning in Data Analysis

2023

View full text Add to dashboard Cite

Machine learning is a process in which computer is used to train and calculate input data and output results in a complex, multi task simulation. In data analysis, we can use machine learning to carry out experimental research and theoretical verification. In order to improve the ability of data analysis, we need to use machine learning and data mining methods to better process data. In this paper, experimental method and principal component analysis method are mainly used to test and discuss the fusion of machine learning in data analysis. The experimental results show that the CPU utilization rate in Scheme 4 is about 85% on average. The reason why the CPU of the Scribe center server is reduced is that after receiving data, there is less data to decompress, which reduces the CPU utilization.

show abstract

Section: Identification Of Event Categories Integrating Data and Know...mentioning

confidence: 99%

Practice and Application of Fusion Machine Learning in Data Analysis

2023

View full text Add to dashboard Cite

show abstract

“…MDL has traditionally been used for model selection [40,18,15,3,34], but its intuitive appeal has led to applications in other areas such as pattern mining [11,23]. In supervised learning, MDL was used in NN as early as [22], in which the authors added Gaussian noise to the weights of the network to control their description length, and thus the amount of information required to communicate the NN.…”

Section: Related Workmentioning

confidence: 99%

Is My Neural Net Driven by the MDL Principle?

Brandao,

Duffner,

Emonet

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The Minimum Description Length principle (MDL) is a formalization of Occam's razor for model selection, which states that a good model is one that can losslessly compress the data while including the cost of describing the model itself. While MDL can naturally express the behavior of certain models such as autoencoders (that inherently compress data) most representation learning techniques do not rely on such models. Instead, they learn representations by training on general or, for self-supervised learning, pretext tasks. In this paper, we propose a new formulation of the MDL principle that relies on the concept of signal and noise, which are implicitly defined by the learning task at hand. Additionally, we introduce ways to empirically measure the complexity of the learned representations by analyzing the spectra of the point Jacobians. Under certain assumptions, we show that the singular values of the point Jacobians of Neural Networks driven by the MDL principle should follow either a power law or a lognormal distribution. Finally, we conduct experiments to evaluate the behavior of the proposed measure applied to deep neural networks on different datasets, with respect to several types of noise. We observe that the experimental spectral distribution is in agreement with the spectral distribution predicted by our MDL principle, which suggests that neural networks trained with gradient descent on noisy data implicitly abide the MDL principle.

show abstract

“…As reporting all frequent patterns often results in overly large and highly redundant results, modern approaches instead focus on discovering patterns that are either significant with regard to some null-hypothesis [13,21,22] or discovering sets of patterns that together generalize the data well [7,27]. For the latter, the MDL criterion has been particularly successful [2,8,27]. Out of these approaches, we compare to Skopus [21], Sqs [27], Ism [7].…”

Section: Related Workmentioning

confidence: 99%

Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns

Cüppers

Vreeken

2023

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

We study the problem of succinctly summarizing a database of event sequences in terms of generalized sequential patterns. That is, we are interested in patterns that are not exclusively defined over observed surface-level events, as is usual, but rather may additionally include generalized events that can match a set of events. To avoid spurious and redundant results we define the problem in terms of the Minimum Description Length principle, by which we are after that set of patterns and generalizations that together best compress the data without loss. The resulting optimization problem does not lend itself for exact search, which is why we propose the heuristic Flock algorithm to efficiently find high-quality models in practice. Extensive experiments on synthetic and real-world data show that Flock results in compact and easily interpretable models that accurately recover the ground truth, including rare instances of generalized patterns. Additionally Flock recovers how generalized events within patterns depend on each other, and overall provides clearer insight into the data-generating process than using state of the art algorithms that only consider surface-level patterns. CCS CONCEPTS• Information systems → Data mining.

show abstract

The minimum description length principle for pattern mining: a survey

Cited by 10 publications

References 198 publications

Practice and Application of Fusion Machine Learning in Data Analysis

Practice and Application of Fusion Machine Learning in Data Analysis

Is My Neural Net Driven by the MDL Principle?

Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns

Contact Info

Product

Resources

About