The bioinformatics discipline seeks to solve problems in biology with computational theories and methods. Formal concept analysis (FCA) is one such theoretical model, based on partial orders. FCA allows the user to examine the structural properties of data based on which subsets of the data set depend on each other. This paper surveys the current literature related to the use of FCA for bioinformatics. The survey begins with a discussion of FCA, its hierarchical advantages, several advanced models of FCA, and lattice management strategies. It then examines how FCA has been used in bioinformatics applications, followed by future prospects of FCA in those areas. The applications addressed include gene data analysis (with next-generation sequencing), biomarkers discovery, protein-protein interaction, disease analysis (including COVID-19, cancer, and others), drug design and development, healthcare informatics, biomedical ontologies, and phylogeny. Some of the most promising prospects of FCA are: identifying influential nodes in a network representing protein-protein interactions, determining critical concepts to discover biomarkers, integrating machine learning and deep learning for cancer classification, and pattern matching for next-generation sequencing.
Massive amounts of data gathered over the last decade have contributed significantly to the applicability of deep neural networks. Deep learning is a good technique to process huge amounts of data because they get better as we feed more data into them. However, in the existing literature, a deep neural classifier is often treated as a ”black box” technique because the process is not transparent and the researchers cannot gain information about how the input is associated to the output. In many domains like medicine, interpretability is very critical because of the nature of the application. Our research focuses on adding interpretability to the black box by integrating Formal Concept Analysis (FCA) into the image classification pipeline and convert it into a glass box. Our proposed approach pro- duces a low dimensional feature vector for an image dataset using autoencoder followed by a supervised fine-tuning of features using a deep neural classifier and Linear Discriminant Analysis (LDA). The low dimensional feature vector produced is then processed by FCA based classifier. The FCA framework helps us develop a glass box classifier from which the relationship between the target class and the low dimensional feature set can be derived. Further, it helps the researchers to understand the classification task and refine it. We use the MNIST dataset to test the interfacing between deep neural networks and the FCA classifier. The classifier achieves an accuracy of 98.7% for binary classification and 97.38% for multi-class classification. We compare the performance of the proposed classifier with Convolutional neural networks (CNN) and Random forest.
Motivation Encoded by (pro-)viruses, anti-CRISPR (Acr) proteins inhibit the CRISPR-Cas immune system of their prokaryotic hosts. As a result, Acr proteins can be employed to develop more controllable CRISPR-Cas genome editing tools. Recent studies revealed that known acr genes often coexist with other acr genes and with phage structural genes within the same operon. For example, we found that 47 of 98 known acr genes (or their homologs) co-exist in the same operons. None of the current Acr prediction tools have considered this important genomic context feature. We have developed a new software tool AOminer to facilitate the improved discovery of new Acrs by fully exploiting the genomic context of known acr genes and their homologs. Results AOminer is the first machine learning based tool focused on the discovery of Acr operons (AOs). A two-state HMM (hidden Markov model) was trained to learn the conserved genomic context of operons that contain known acr genes or their homologs, and the learnt features could distinguish AOs and non-AOs. AOminer allows automated mining for potential AOs from query genomes or operons. AOminer outperformed all existing Acr prediction tools with an accuracy = 0.85. AOminer will facilitate the discovery of novel anti-CRISPR operons. Availability The webserver is available at: http://aca.unl.edu/AOminer/AOminer_APP/. The python program is at: https://github.com/boweny920/AOminer. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.