Biclustering has become a popular technique for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. Most of biclustering approaches use a measure or cost function that determines the quality of biclusters. In such cases, the development of both a suitable heuristics and a good measure for guiding the search are essential for discovering interesting biclusters in an expression matrix. Nevertheless, not all existing biclustering approaches base their search on evaluation measures for biclusters. There exists a diverse set of biclustering tools that follow different strategies and algorithmic concepts which guide the search towards meaningful results. In this paper we present a extensive survey of biclustering approaches, classifying them into two categories according to whether or not use evaluation metrics within the search method: biclustering algorithms based on evaluation measures and non metric-based biclustering algorithms. In both cases, they have been classified according to the type of meta-heuristics which they are based on.
Some of the most influential factors in the quality of the solutions found by an evolutionary algorithm (EA) are a correct coding of the search space and an appropriate evaluation function of the potential solutions. EAs are often used to learn decision rules from datasets, which are encoded as individuals in the genetic population. In this paper, the coding of the search space for the obtaining of those decision rules is approached, i.e., the representation of the individuals of the genetic population and also the design of specific genetic operators. Our approach, called "natural coding," uses one gene per feature in the dataset (continuous or discrete). The examples from the datasets are also encoded into the search space, where the genetic population evolves, and therefore the evaluation process is improved substantially. Genetic operators for the natural coding are formally defined as algebraic expressions. Experiments with several datasets from the University of California at Irvine (UCI) machine learning repository show that as the genetic operators are better guided through the search space, the number of rules decreases considerably while maintaining the accuracy, similar to that of hybrid coding, which joins the well-known binary and real representations to encode discrete and continuous attributes, respectively. The computational cost associated with the natural coding is also reduced with regard to the hybrid representation. Our algorithm, HIDER*, has been statistically tested against C4.5 and C4.5 Rules, and performed well. The knowledge models obtained are simpler, with very few decision rules, and therefore easier to understand, which is an advantage in many domains. The experiments with high-dimensional datasets showed the same good behavior, maintaining the quality of the knowledge model with respect to prediction accuracy. Index Terms-Decision rules, evolutionary encoding, supervised learning. I. INTRODUCTION D ECISION RULES are especially relevant in problems related to supervised learning. Given a dataset with continuous and discrete features or attributes, and a class label, we try to find a rule set that describes the knowledge within data or classifies new unseen data. When the feature is discrete, the rules take the form of "if , then class," where the values are not necessarily all those that the feature can take. When the feature is continuous, typically the rules take the form of "if then class," where
BackgroundBiclustering algorithms for microarray data aim at discovering functionally related gene sets under different subsets of experimental conditions. Due to the problem complexity and the characteristics of microarray datasets, heuristic searches are usually used instead of exhaustive algorithms. Also, the comparison among different techniques is still a challenge. The obtained results vary in relevant features such as the number of genes or conditions, which makes it difficult to carry out a fair comparison. Moreover, existing approaches do not allow the user to specify any preferences on these properties.ResultsHere, we present the first biclustering algorithm in which it is possible to particularize several biclusters features in terms of different objectives. This can be done by tuning the specified features in the algorithm or also by incorporating new objectives into the search. Furthermore, our approach bases the bicluster evaluation in the use of expression patterns, being able to recognize both shifting and scaling patterns either simultaneously or not. Evolutionary computation has been chosen as the search strategy, naming thus our proposal Evo-Bexpa (Evolutionary Biclustering based in Expression Patterns).ConclusionsWe have conducted experiments on both synthetic and real datasets demonstrating Evo-Bexpa abilities to obtain meaningful biclusters. Synthetic experiments have been designed in order to compare Evo-Bexpa performance with other approaches when looking for perfect patterns. Experiments with four different real datasets also confirm the proper performing of our algorithm, whose results have been biologically validated through Gene Ontology.
Abstract-The increasing amount of information available is encouraging the search for efficient techniques to improve the data mining methods, especially those which consume great computational resources, such as evolutionary computation. Efficacy and efficiency are two critical aspects for knowledge-based techniques. The incorporation of knowledge into evolutionary algorithms (EAs) should provide either better solutions (efficacy) or the equivalent solutions in shorter time (efficiency), regarding the same evolutionary algorithm without incorporating such knowledge. In this paper, we categorize and summarize some of the incorporation of knowledge techniques for evolutionary algorithms and present a novel data structure, called efficient evaluation structure (EES), which helps the evolutionary algorithm to provide decision rules using less computational resources. The EES-based EA is tested and compared to another EA system and the experimental results show the quality of our approach, reducing the computational cost about 50%, maintaining the global accuracy of the final set of decision rules.
Type 1 ryanodine receptors (RyR1s) release Ca(2+) from the sarcoplasmic reticulum to initiate skeletal muscle contraction. The role of RyR1-G4934 and -G4941 in the pore-lining helix in channel gating and ion permeation was probed by replacing them with amino acid residues of increasing side chain volume. RyR1-G4934A, -G4941A, and -G4941V mutant channels exhibited a caffeine-induced Ca(2+) release response in HEK293 cells and bound the RyR-specific ligand [(3)H]ryanodine. In single channel recordings, significant differences in the number of channel events and mean open and close times were observed between WT and RyR1-G4934A and -G4941A. RyR1-G4934A had reduced K(+) conductance and ion selectivity compared with WT. Mutations further increasing the side chain volume at these positions (G4934V and G4941I) resulted in reduced caffeine-induced Ca(2+) release in HEK293 cells, low [(3)H]ryanodine binding levels, and channels that were not regulated by Ca(2+) and did not conduct Ca(2+) in single channel measurements. Computational predictions of the thermodynamic impact of mutations on protein stability indicated that although the G4934A mutation was tolerated, the G4934V mutation decreased protein stability by introducing clashes with neighboring amino acid residues. In similar fashion, the G4941A mutation did not introduce clashes, whereas the G4941I mutation resulted in intersubunit clashes among the mutated isoleucines. Co-expression of RyR1-WT with RyR1-G4934V or -G4941I partially restored the WT phenotype, which suggested lessening of amino acid clashes in heterotetrameric channel complexes. The results indicate that both glycines are important for RyR1 channel function by providing flexibility and minimizing amino acid clashes.
PurposeThe purpose of this paper is to present a novel control mechanism for avoiding overlapping among biclusters in expression data.Design/methodology/approachBiclustering is a technique used in analysis of microarray data. One of the most popular biclustering algorithms is introduced by Cheng and Church (2000) (Ch&Ch). Even if this heuristic is successful at finding interesting biclusters, it presents several drawbacks. The main shortcoming is that it introduces random values in the expression matrix to control the overlapping. The overlapping control method presented in this paper is based on a matrix of weights, that is used to estimate the overlapping of a bicluster with already found ones. In this way, the algorithm is always working on real data and so the biclusters it discovers contain only original data.FindingsThe paper shows that the original algorithm wrongly estimates the quality of the biclusters after some iterations, due to random values that it introduces. The empirical results show that the proposed approach is effective in order to improve the heuristic. It is also important to highlight that many interesting biclusters found by using our approach would have not been obtained using the original algorithm.Originality/valueThe original algorithm proposed by Ch&Ch is one of the most successful algorithms for discovering biclusters in microarray data. However, it presents some limitations, the most relevant being the substitution phase adopted in order to avoid overlapping among biclusters. The modified version of the algorithm proposed in this paper improves the original one, as proven in the experimentation.
Abstract.To select an adequate coding is one of the main problems in applications based on Evolutionary Algorithms. Many codings have been proposed to represent the search space for obtaining decision rules. A suitable representation of the individuals of the genetic population can reduce the search space, so that the learning process is accelerated by decreasing the number of necessary generations to complete the task. In this sense, natural coding achieves such reduction and improves the results obtained by other codings. This paper justifies the use of natural coding by comparing it with hybrid coding that joins well-known binary and real representations. We have tested both codings on a heterogeneous subset of databases from the UCI Machine Learning Repository. The experiments' results show that natural coding improves the quality of the obtained knowledge-model using only one third of the generations that hybrid coding needs as well as a smaller population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.