Koen Smets scite author profile

Mining small, useful, and high-quality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by post-processing it is virtually impossible to analyse dense or large databases in any detail.We introduce Slim, an any-time algorithm for mining high-quality sets of itemsets directly from data. We use MDL to identify the best set of itemsets as that set that describes the data best. To approximate this optimum, we iteratively use the current solution to determine what itemset would provide most gainestimating quality using an accurate heuristic. Without requiring a pre-mined candidate collection, Slim is parameter-free in both theory and practice.Experiments show we mine high-quality pattern sets; while evaluating orders-of-magnitude fewer candidates than our closest competitor, Krimp, we obtain much better compression ratios-closely approximating the locally-optimal strategy. Classification experiments independently verify we characterise data very well.

show abstract

The Odd One Out: Identifying and Characterising Anomalies

Smets¹,

Vreeken²

2011

View full text Add to dashboard Cite

In many situations there exists an abundance of positive examples, but only a handful of negatives. In this paper we show how in binary or transaction data such rare cases can be identified and characterised. Our approach uses the Minimum Description Length principle to decide whether an instance is drawn from the training distribution or not. By using frequent itemsets to construct this compressor, we can easily and thoroughly characterise the decisions, and explain what changes in an example would lead to a different verdict. Furthermore, we give a technique through which, given only a few negative examples, the decision landscape and optimal boundary can be predicted-making the approach parameter-free. Experimentation on benchmark and real data shows our method provides very high classification accuracy, thorough and insightful characterisation of decisions, predicts the decision landscape reliably, and can pinpoint observation errors. Moreover, a case study on real MCADD data shows we provide an interpretable approach with state-of-the-art performance for screening newborn babies for rare diseases.

show abstract

An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data

Valkenborg

Smets

et al. 2011

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundNuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline.ResultsWe introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data.ConclusionsThe workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.

show abstract

Flash pyrolysis of rapeseed cake: Influence of temperature on the yield and the characteristics of the pyrolysis liquid

Smets

Adriaensens

Reggers

et al. 2011

Journal of Analytical and Applied Pyrolysis

View full text Add to dashboard Cite

Platinum Particle Size and Support Effects in NO_xMediated Carbon Oxidation over Platinum Catalysts

Villani

Vermandel

Smets

et al. 2006

Environ. Sci. Technol.

View full text Add to dashboard Cite

Platinum metal was dispersed on microporous, mesoporous, and nonporous support materials including the zeolites Na-Y, Ba-Y, Ferrierite, ZSM-22, ETS-10, and AIPO-11, alumina, and titania. The oxidation of carbon black loosely mixed with catalyst powder was monitored gravimetrically in a gas stream containing nitric oxide, oxygen, and water. The carbon oxidation activity of the catalysts was found to be uniquely related to the Pt dispersion and little influenced by support type. The optimum dispersion is around 3-4% corresponding to relatively large Pt particle sizes of 20-40 nm. The carbon oxidation activity reflects the NO oxidation activity of the platinum catalyst, which reaches an optimum in the 20-40 nm Pt particle size range. The lowest carbon oxidation temperatures were achieved with platinum loaded ZSM-22 and AIPO-11 zeolite crystallites bearing platinum of optimum dispersion on their external surfaces.

show abstract

Evaluation of flash and slow pyrolysis applied on heavy metal contaminated Sorghum bicolor shoots resulting from phytoremediation

Chami

Amer

Smets

et al. 2014

Biomass and Bioenergy

View full text Add to dashboard Cite

Slow catalytic pyrolysis of rapeseed cake: Product yield and characterization of the pyrolysis liquid

Smets

Roukaerts

Czech

et al. 2013

Biomass and Bioenergy

View full text Add to dashboard Cite

Water content of pyrolysis oil: Comparison between Karl Fischer titration, GC/MS-corrected azeotropic distillation and 1H NMR spectroscopy

Smets

Adriaensens

Vandewijngaarden

et al. 2011

Journal of Analytical and Applied Pyrolysis

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Koen Smets

Slim: Directly Mining Descriptive Patterns

The Odd One Out: Identifying and Characterising Anomalies

An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data

Flash pyrolysis of rapeseed cake: Influence of temperature on the yield and the characteristics of the pyrolysis liquid

Platinum Particle Size and Support Effects in NO_xMediated Carbon Oxidation over Platinum Catalysts

Evaluation of flash and slow pyrolysis applied on heavy metal contaminated Sorghum bicolor shoots resulting from phytoremediation

Slow catalytic pyrolysis of rapeseed cake: Product yield and characterization of the pyrolysis liquid

Water content of pyrolysis oil: Comparison between Karl Fischer titration, GC/MS-corrected azeotropic distillation and 1H NMR spectroscopy

Contact Info

Product

Resources

About

Koen Smets

Slim: Directly Mining Descriptive Patterns

The Odd One Out: Identifying and Characterising Anomalies

An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data

Flash pyrolysis of rapeseed cake: Influence of temperature on the yield and the characteristics of the pyrolysis liquid

Platinum Particle Size and Support Effects in NOxMediated Carbon Oxidation over Platinum Catalysts

Evaluation of flash and slow pyrolysis applied on heavy metal contaminated Sorghum bicolor shoots resulting from phytoremediation

Slow catalytic pyrolysis of rapeseed cake: Product yield and characterization of the pyrolysis liquid

Water content of pyrolysis oil: Comparison between Karl Fischer titration, GC/MS-corrected azeotropic distillation and 1H NMR spectroscopy

Contact Info

Product

Resources

About

Platinum Particle Size and Support Effects in NO_xMediated Carbon Oxidation over Platinum Catalysts