Feature selection reduces the complexity of high-dimensional datasets and helps to gain insights into systematic variation in the data. These aspects are essential in domains that rely on model interpretability, such as life sciences. We propose a (U)ser-Guided (Bay)esian Framework for (F)eature (S)election, UBayFS, an ensemble feature selection technique embedded in a Bayesian statistical framework. Our generic approach considers two sources of information: data and domain knowledge. From data, we build an ensemble of feature selectors, described by a multinomial likelihood model. Using domain knowledge, the user guides UBayFS by weighting features and penalizing feature blocks or combinations, implemented via a Dirichlet-type prior distribution. Hence, the framework combines three main aspects: ensemble feature selection, expert knowledge, and side constraints. Our experiments demonstrate that UBayFS (a) allows for a balanced trade-off between user knowledge and data observations and (b) achieves accurate and robust results.
Semiconductor manufacturing is a highly innovative branch of industry, where a high degree of automation has already been achieved. For example, devices tested to be outside of their specifications in electrical wafer test are automatically scrapped. In this work, we go one step further and analyse test data of devices still within the limits of the specification, by exploiting the information contained in the analog wafermaps.To that end, we propose two feature extraction approaches with the aim to detect patterns in the wafer test dataset. Such patterns might indicate the onset of critical deviations in the production process. The studied approaches are: (A) classical image processing and restoration techniques in combination with sophisticated feature engineering and (B) a data-driven deep generative model. The two approaches are evaluated on both a synthetic and a real-world dataset. The synthetic dataset has been modelled based on real-world patterns and characteristics. We found both approaches to provide similar overall evaluation metrics. Our in-depth analysis helps to choose one approach over the other depending on data availability as a major aspect, as well as on available computing power and required interpretability of the results.
Image restoration and denoising is an essential preprocessing step for almost every subsequent task in computer vision. Markov Random Fields offer a wellfounded, sophisticated approach for this purpose, but unfortunately the associated computation procedures are not sufficiently fast, due to a high-dimensional optimization problem. While the increase of computing power could not solve this runtime issue appropriately, we address it in a mathematical way: we suggest an analytical solution for the optimum of the inference problem, which provides desirable mathematical properties. In practice, our new method accelerates the runtime via reducing the computational complexity of the image restoration task by orders of magnitude, independent from the smoothing intensity. As a result, Markov Random Fields can be considered for modern big data problems in computer vision, especially if numerous images with equal sizes are processed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.