A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets

Background: Anaplastic thyroid carcinoma (ATC) accounts for only 3% of thyroid cancers, yet strikingly, it accounts for almost 40% of thyroid cancer deaths. Currently, no effective therapies exist. In an effort to identify ATC-specific therapeutic targets, we analyzed global gene expression data from multiple studies to identify ATC-specific dysregulated genes. Methods: The National Center for Biotechnology Information Gene Expression Omnibus database was searched for high-throughput gene expression microarray studies from human ATC tissue along with normal thyroid and/or papillary thyroid cancer (PTC) tissue. Gene expression levels in ATC were compared with normal thyroid or PTC using seven separate comparisons, and an ATC-specific gene set common in all seven comparisons was identified. We investigated these genes for their biological functions and pathways. Results: There were three studies meeting inclusion criteria, (including 32 ATC patients, 69 PTC, and 75 normal). There were 259 upregulated genes and 286 downregulated genes in ATC with at least two-fold change in all seven comparisons. Using a five-fold filter, 36 genes were upregulated in ATC, while 40 genes were downregulated. Of the 10 top globally upregulated genes in ATC, 4/10 (MMP1, ANLN, CEP55, and TFPI2) are known to play a role in ATC progression; however, 6/10 genes (TMEM158, CXCL5, E2F7, DLGAP5, MME, and ASPM) had not been specifically implicated in ATC. Similarly, 3/10 (SFTA3, LMO3, and C2orf40) of the most globally downregulated genes were novel in this context, while 7/10 genes (SLC26A7, TG, TSHR, DUOX2, CDH1, PDE8B, and FOXE1) have been previously identified in ATC. We experimentally validated a significant correlation for seven transcription factors (KLF16, SP3, ETV6, FOXC1, SP1, EGFR1, and MAFK) with the ATC-specific genes using microarray analysis of ATC cell lines. Ontology clustering of globally altered genes revealed that ''mitotic cell cycle'' is highly enriched in the globally upregulated gene set (44% of top upregulated genes, p-value <10 -30 ). Conclusions: By focusing on globally altered genes, we have identified a set of consistently altered biological processes and pathways in ATC. Our data are consistent with an important role for M-phase cell cycle genes in ATC, and may provide direction for future studies to identify novel therapeutic targets for this disease.

show abstract

“…1). In order to understand the expression patterns of differentially expressed genes, we performed cluster analysis using HPCluster program (8).…”

Section: Gene Expression Data Analysesmentioning

confidence: 99%

Cell Cycle M-Phase Genes Are Highly Upregulated in Anaplastic Thyroid Carcinoma

Weinberger

Ponny

et al. 2017

Thyroid

Self Cite

View full text Add to dashboard Cite

show abstract

“…Regarding the generalization capability, the optimal number of rules is automatically determined on the basis of learning theory [49]. The best model is chosen by evaluating a cost function based on network complexity and approximation error [50].…”

Section: Honfismentioning

confidence: 99%

Prediction in Photovoltaic Power by Neural Networks

et al. 2017

View full text Add to dashboard Cite

Abstract:The ability to forecast the power produced by renewable energy plants in the short and middle term is a key issue to allow a high-level penetration of the distributed generation into the grid infrastructure. Forecasting energy production is mandatory for dispatching and distribution issues, at the transmission system operator level, as well as the electrical distributor and power system operator levels. In this paper, we present three techniques based on neural and fuzzy neural networks, namely the radial basis function, the adaptive neuro-fuzzy inference system and the higher-order neuro-fuzzy inference system, which are well suited to predict data sequences stemming from real-world applications. The preliminary results concerning the prediction of the power generated by a large-scale photovoltaic plant in Italy confirm the reliability and accuracy of the proposed approaches.

show abstract

“…Bohland et al (2010) used K-means to cluster all left hemisphere brain voxels, a 25, 155 × 271 matrix is used as an input for the algorithm. Sharma et al (2009) used a two-stage hyperplane algorithm applied in a software package called HPCluster. The first stage reduced the data size and the second stage was the conventional K-means.…”

Section: Algorithms Used For Clustering Gene Expression Datamentioning

confidence: 99%

Clustering of high throughput gene expression data

Pirim

Ekşioğlu

Perkins

et al. 2012

Computers & Operations Research

106

View full text Add to dashboard Cite

High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community.

show abstract

A modified hyperplane clustering algorithm allows for efficient and accurate clustering of extremely large datasets

Cited by 17 publications

References 12 publications

Cell Cycle M-Phase Genes Are Highly Upregulated in Anaplastic Thyroid Carcinoma

Cell Cycle M-Phase Genes Are Highly Upregulated in Anaplastic Thyroid Carcinoma

Prediction in Photovoltaic Power by Neural Networks

Clustering of high throughput gene expression data

Contact Info

Product

Resources

About