Detection of biomarker genes and their regulatory doses of chemical compounds (DCCs) is one of the most important tasks in toxicogenomic studies as well as in drug design and development. There is an online computational platform “Toxygates” to identify biomarker genes and their regulatory DCCs by co-clustering approach. Nevertheless, the algorithm of that platform based on hierarchical clustering (HC) does not share gene-DCC two-way information simultaneously during co-clustering between genes and DCCs. Also it is sensitive to outlying observations. Thus, this platform may produce misleading results in some cases. The probabilistic hidden variable model (PHVM) is a more effective co-clustering approach that share two-way information simultaneously, but it is also sensitive to outlying observations. Therefore, in this paper we have proposed logistic probabilistic hidden variable model (LPHVM) for robust co-clustering between genes and DCCs, since gene expression data are often contaminated by outlying observations. We have investigated the performance of the proposed LPHVM co-clustering approach in a comparison with the conventional PHVM and Toxygates co-clustering approaches using simulated and real life TGP gene expression datasets, respectively. Simulation results show that the proposed method improved the performance over the conventional PHVM in presence of outliers; otherwise, it keeps equal performance. In the case of real life TGP data analysis, three DCCs (glibenclamide-low, perhexilline-low, and hexachlorobenzene-medium) for glutathione metabolism pathway dataset as well as two DCCs (acetaminophen-medium and methapyrilene-low) for PPAR signaling pathway dataset were incorrectly co-clustered by the Toxygates online platform, while only one DCC (hexachlorobenzene-low) for glutathione metabolism pathway was incorrectly co-clustered by the proposed LPHVM approach. Our findings from the real data analysis are also supported by the other findings in the literature.
Assessment of drugs toxicity and associated biomarker genes is one of the most important tasks in the pre-clinical phase of drug development pipeline as well as in the toxicogenomic studies. There are few statistical methods for the assessment of doses of drugs (DDs) toxicity and their associated biomarker genes. However, these methods consume more time for computation of the model parameters using the EM (Expectation-Maximization) based iterative approaches. To overcome this problem, in this paper, an attempt is made to propose an alternative approach based on hierarchical clustering (HC) for the same purpose. There are several types of HC approaches whose performance depends on different similarity/distance measures. Therefore, we explored suitable combinations of distance measures and HC methods based on Japanese Toxicogenomics Project (TGP) datasets for better clustering/co-clustering between DDs and genes as well as to detect toxic DDs and their associated biomarker genes. We observed that Word’s HC method with each of Euclidean, Manhattan and Minkowski distance measures produces better clustering/co-clustering results. For an example, in case of glutathione metabolism pathway (GMP) dataset LOC100359539/Rrm2, Gpx6, RGD1562107, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Mgst2, Gsr, Gpx2, Gclm, Gstp1, LOC100912604/Srm, Gstm4, Odc1, Gsr, Gss are the biomarker genes and Acetaminophen_Middle, Acetaminophen_High, Methapyrilene_High, Nitrofurazone_High, Nitrofurazone_Middle, Isoniazid_Middle, Isoniazid_High are their regulatory (associated) DDs explored by our proposed co-clustering algorithm based on the distance and HC method combination Euclidean: Word. Similarly, for the PPAR signaling pathway (PPAR-SP) dataset Cpt1a, Cyp8b1, Cyp4a3, Ehhadh, Plin5, Plin2, Fabp3, Me1, Fabp5, LOC100910385, Cpt2, Acaa1a, Cyp4a1, LOC100365047, Cpt1a, LOC100365047, Angptl4, Aqp7, Cpt1c, Cpt1b, Me1 are the biomarker genes and Aspirin_Low, Aspirin_Middle, Aspirin_High, Benzbromarone_Middle, Benzbromarone_High, Clofibrate_Middle, Clofibrate_High, WY14643_Low, WY14643_High, WY14643_Middle, Gemfibrozil_Middle, Gemfibrozil_High are their regulatory DDs. These results are validated by the available literature and functional annotation.
Quantitative trait locus (QTL) analysis is a statistical method that links two types of information such as phenotypic data (trait measurements) and genotypic data (usually molecular markers). There a number of QTL tools have been developed for gene linkage mapping. Standard Interval Mapping (SIM) or Simple Interval Mapping or Interval Mapping (IM), Haley Knott, Extended Haley Knott and Multiple Imputation (IMP) method when the single-QTL is unlinked and Composite Interval Mapping (CIM) is designed to map the genetic linkage for both linked and unlinked genes in the chromosome. Performance of these methods is measured based on calculated LOD score. The QTLs are considered significant above the threshold LOD score 3.0. For backcross-simulated data, the CIM method performs significantly in detecting QTLs compare to other SIM mapping methods. CIM detected three QTLs in chromosome 1 and 4 whereas the other methods were unable to detect any significant marker positions for simulated data. For a real rice dataset, CIM also showed performance considerably in detecting marker positions compared to other four interval mapping methods. CIM finally detected 12 QTL positions while each of the other four SIM methods detected only six positions.
Toxicogenomics combines high throughput molecular technologies with statistical and machine learning approaches to discover a similar group of doses of chemical compounds (DCCs) and genes to explore toxicogenomic biomarkers and their regulatory DCCs. This is also very important in the toxicity study of environmental stressors, synthetic chemicals and drug discovery and development process. Different clustering algorithms are concerned with the discovering of interesting clusters/groups of row or column entities of a dataset. Among those hierarchical clustering (HC) and logistic probabilistic hidden variable model (LPHVM) can identify toxicogenomic biomarkers and their regulatory DCCs forming co-cluster. However, the HC method is very sensitive to outlying observations. On the other hand, though LPHVM is a robust approach, it consumes more time for calculation since it is Expectation-Maximization (EM) based iterative approach. Additionally, the LPHVM creates artificiality problem taking absolute value of the data matrix. Therefore, to overcome these problems in this paper, we proposed a robust hierarchical co-clustering (RHCOC) algorithm to co-cluster genes and DCCs simultaneously with a view to explore toxicogenomic biomarkers and their regulatory DCCs. The performance of the proposed RHCOC algorithm over the github (https://github.com/mdbahadur/rhcoclust).
The aim of toxicogenomic studies is to optimize the toxic dose levels of chemical compounds (CCs) and their regulated biomarker genes. This is also crucial in drug discovery and development. There are popular online computational tools such as ToxDB and Toxygates to identify toxicogenomic biomarkers using t-test. However, they are not suitable for the identification of biomarker gene regulatory dose of corresponding CCs. Hence, we describe a one-way ANOVA model together with Tukey's HSD test for the identification of toxicogenomic biomarker genes and their influencing CC dose with improved efficiency. Glutathione metabolism pathway data analysis shows high and middle dose for acetaminophen, and nitrofurazone as well as high dose for methapyrilene as significant toxic CC dose. The corresponding regulated top seven toxicogenomic biomarker genes found in this analysis is Gstp1, Gsr, Mgst2, Gclm, G6pd, Gsta5 and Gclc.
Classification of functional metagenomes from the microbial community plays the vital role in the metagenomics research. In this paper, an investigation was made to study the performance of beta-t random forest classifier for classification of metagenomics data. Nine key functional meta-genomic variables were selected using the beta-t test statistic from the 10 different microbial community using p-value at 5% level of significance. Then beta-t random forest classifier showed the higher accuracy (96%), true positive rate (96%) and lower false positive rate (5%), false discovery rate (5%) and misclassification error rate (5%) for classification of metagenomes. This method showed the better performance compare to Bayes, SVM, KNN, AdaBoost and LogitBoost).
Background and objectives: Assessment of drugs toxicity and associated biomarker genes is one of the most important tasks in the pre-clinical phase of drug development pipeline as well as in toxicogenomic studies. There are few statistical methods for the assessment of doses of drugs (DDs) toxicity and their associated biomarker genes. However, these methods consume more time for computation of the model parameters using the EM (expectation-maximization) based iterative approaches. To overcome this problem, in this paper, an attempt is made to propose an alternative approach based on hierarchical clustering (HC) for the same purpose. Methods and materials: There are several types of HC approaches whose performance depends on different similarity/distance measures. Therefore, we explored suitable combinations of distance measures and HC methods based on Japanese Toxicogenomics Project (TGP) datasets for better clustering/co-clustering between DDs and genes as well as to detect toxic DDs and their associated biomarker genes. Results: We observed that Word’s HC method with each of Euclidean, Manhattan, and Minkowski distance measures produces better clustering/co-clustering results. For an example, in the case of the glutathione metabolism pathway (GMP) dataset LOC100359539/Rrm2, Gpx6, RGD1562107, Gstm4, Gstm3, G6pd, Gsta5, Gclc, Mgst2, Gsr, Gpx2, Gclm, Gstp1, LOC100912604/Srm, Gstm4, Odc1, Gsr, Gss are the biomarker genes and Acetaminophen_Middle, Acetaminophen_High, Methapyrilene_High, Nitrofurazone_High, Nitrofurazone_Middle, Isoniazid_Middle, Isoniazid_High are their regulatory (associated) DDs explored by our proposed co-clustering algorithm based on the distance and HC method combination Euclidean: Word. Similarly, for the peroxisome proliferator-activated receptor signaling pathway (PPAR-SP) dataset Cpt1a, Cyp8b1, Cyp4a3, Ehhadh, Plin5, Plin2, Fabp3, Me1, Fabp5, LOC100910385, Cpt2, Acaa1a, Cyp4a1, LOC100365047, Cpt1a, LOC100365047, Angptl4, Aqp7, Cpt1c, Cpt1b, Me1 are the biomarker genes and Aspirin_Low, Aspirin_Middle, Aspirin_High, Benzbromarone_Middle, Benzbromarone_High, Clofibrate_Middle, Clofibrate_High, WY14643_Low, WY14643_High, WY14643_Middle, Gemfibrozil_Middle, Gemfibrozil_High are their regulatory DDs. Conclusions: Overall, the methods proposed in this article, co-cluster the genes and DDs as well as detect biomarker genes and their regulatory DDs simultaneously consuming less time compared to other mentioned methods. The results produced by the proposed methods have been validated by the available literature and functional annotation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.