Gene expression measurements represent the most important source of biological data used to unveil the interaction and functionality of genes. In this regard, several data mining and machine learning algorithms have been proposed that require, in a number of cases, some kind of data discretization to perform the inference. Selection of an appropriate discretization process has a major impact on the design and outcome of the inference algorithms, as there are a number of relevant issues that need to be considered. This study presents a revision of the current state-of-the-art discretization techniques, together with the key subjects that need to be considered when designing or selecting a discretization approach for gene expression data.
The selection of descriptor subsets for QSAR/QSPR is a hard combinatorial problem that requires the evaluation of complex relationships in order to assess the relevance of the selected subsets. In this paper, we describe the main issues in applying descriptor selection for QSAR methods and propose a novel two-phase methodology for this task. The first phase makes use of a multi-objective evolutionary technique which yields interesting advantages compared to mono-objective methods. The second phase complements the first one and it enables to refine and improve the confidence in the chosen subsets of descriptors. This methodology allows the selection of subsets when a large number of descriptors are involved and it is also suitable for linear and nonlinear QSAR/QSPR models. The proposed method was tested using three data sets with experimental values for blood-brain barrier penetration, human intestinal absorption and hydrophobicity. Results reveal the capability of the method for achieving subsets of descriptors with a high predictive capacity and a low cardinality. Therefore, our proposal constitutes a new promising technique helpful for the development of QSAR/QSPR models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.