An important approach for unsupervised landcover classification in remote sensing images is the clustering of pixels in the spectral domain into several fuzzy partitions. In this paper, a multiobjective optimization algorithm is utilized to tackle the problem of fuzzy partitioning where a number of fuzzy cluster validity indexes are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of nondominated solutions, which the user can judge relatively and pick up the most promising one according to the problem requirements. Real-coded encoding of the cluster centers is used for this purpose.Results demonstrating the effectiveness of the proposed technique are provided for numeric remote sensing data described in terms of feature vectors. Different landcover regions in remote sensing imagery have also been classified using the proposed technique to establish its efficiency.Index Terms-Cluster validity measures, fuzzy clustering, genetic algorithm (GA), multiobjective optimization (MOO), Pareto-optimal, pixel classification, remote sensing imagery.
The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
Data clustering is a popular unsupervised data mining tool that is used for partitioning a given dataset into homogeneous groups based on some similarity/dissimilarity metric. Traditional clustering algorithms often make prior assumptions about the cluster structure and adopt a corresponding suitable objective function that is optimized either through classical techniques or metaheuristic approaches. These algorithms are known to perform poorly when the cluster assumptions do not hold in the data. Multiobjective clustering, in which multiple objective functions are simultaneously optimized, has emerged as an attractive and robust alternative in such situations. In particular, application of multiobjective evolutionary algorithms for clustering has become popular in the past decade because of their population-based nature. Here, we provide a comprehensive and critical survey of the multitude of multiobjective evolutionary clustering techniques existing in the literature. The techniques are classified according to the encoding strategies adopted, objective functions, evolutionary operators, strategy for maintaining nondominated solutions, and the method of selection of the final solution. The pros and cons of the different approaches are mentioned. Finally, we have discussed some real-life applications of multiobjective clustering in the domains of image segmentation, bioinformatics, web mining, and so forth.
Identification of potential viral-host protein interactions is a vital and useful approach towards development of new drugs targeting those interactions. In recent days, computational tools are being utilized for predicting viral-host interactions. Recently a database containing records of experimentally validated interactions between a set of HIV-1 proteins and a set of human proteins has been published. The problem of predicting new interactions based on this database is usually posed as a classification problem. However, posing the problem as a classification one suffers from the lack of biologically validated negative interactions. Therefore it will be beneficial to use the existing database for predicting new viral-host interactions without the need of negative samples. Motivated by this, in this article, the HIV-1–human protein interaction database has been analyzed using association rule mining. The main objective is to identify a set of association rules both among the HIV-1 proteins and among the human proteins, and use these rules for predicting new interactions. In this regard, a novel association rule mining technique based on biclustering has been proposed for discovering frequent closed itemsets followed by the association rules from the adjacency matrix of the HIV-1–human interaction network. Novel HIV-1–human interactions have been predicted based on the discovered association rules and tested for biological significance. For validation of the predicted new interactions, gene ontology-based and pathway-based studies have been performed. These studies show that the human proteins which are predicted to interact with a particular viral protein share many common biological activities. Moreover, literature survey has been used for validation purpose to identify some predicted interactions that are already validated experimentally but not present in the database. Comparison with other prediction methods is also discussed.
DNA microarray is a powerful technology that can simultaneously determine the levels of thousands of transcripts (generated, for example, from genes/miRNAs) across different experimental conditions or tissue samples. The motto of differential expression analysis is to identify the transcripts whose expressions change significantly across different types of samples or experimental conditions. A number of statistical testing methods are available for this purpose. In this paper, we provide a comprehensive survey on different parametric and non-parametric testing methodologies for identifying differential expression from microarray data sets. The performances of the different testing methods have been compared based on some real-life miRNA and mRNA expression data sets. For validating the resulting differentially expressed miRNAs, the outcomes of each test are checked with the information available for miRNA in the standard miRNA database PhenomiR 2.0. Subsequently, we have prepared different simulated data sets of different sample sizes (from 10 to 100 per group/population) and thereafter the power of each test have been calculated individually. The comparative simulated study might lead to formulate robust and comprehensive judgements about the performance of each test in the basis of assumption of data distribution. Finally, a list of advantages and limitations of the different statistical tests has been provided, along with indications of some areas where further studies are required.
Background COVID-19 (Coronavirus Disease-19), a disease caused by the SARS-CoV-2 virus, has been declared as a pandemic by the World Health Organization on March 11, 2020. Over 15 million people have already been affected worldwide by COVID-19, resulting in more than 0.6 million deaths. Protein–protein interactions (PPIs) play a key role in the cellular process of SARS-CoV-2 virus infection in the human body. Recently a study has reported some SARS-CoV-2 proteins that interact with several human proteins while many potential interactions remain to be identified. Method In this article, various machine learning models are built to predict the PPIs between the virus and human proteins that are further validated using biological experiments. The classification models are prepared based on different sequence-based features of human proteins like amino acid composition, pseudo amino acid composition, and conjoint triad. Result We have built an ensemble voting classifier using SVM Radial , SVM Polynomial , and Random Forest technique that gives a greater accuracy, precision, specificity, recall, and F1 score compared to all other models used in the work. A total of 1326 potential human target proteins of SARS-CoV-2 have been predicted by the proposed ensemble model and validated using gene ontology and KEGG pathway enrichment analysis. Several repurposable drugs targeting the predicted interactions are also reported. Conclusion This study may encourage the identification of potential targets for more effective anti-COVID drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.