Reclust: an efficient clustering algorithm for mixed data based on reclustering and cluster validation

Arockiam, Amala Jayanthi Maria Soosai; Irudhayaraj, Elizabeth Shanthi

doi:10.11591/ijeecs.v29.i1.pp545-552

Cited by 4 publications

(5 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In realworld applications, both numeric and categorical features are often used to define the data. Clustering analysis is one of the most important approaches in DM, and it seeks to find the nature of groupings or clusters of data objects within an attribute space [8,11,16]. For an exploratory approach, we applied clustering analysis to the dataset in Appendix B.…”

Section: Cluster Analysismentioning

confidence: 99%

“…(32), knn (49), svm (81) and rf (69), to the right of the graph characterised by a strongly positive coordinate on the axis, to individuals such as MCDA C (58), characterised by a strongly negative coordinate on the axis (to the left of the graph). Dimension 2 opposes individuals such as lstm (54), word2vec (88), nlp (63) and BIM ( 16), who at the top of the graph, and characterised by a low positive co-ordinate on the axis, with individuals such as ann (8), adaboost (3), who have low negative coordinate on the axis and are located at the bottom of the graph. The Dim1, group 1 (dt , knn, svm and rf) is sharing high values for the variables "predicting", "supervised", "monitoring", "frequency", "institutional data", "data project-simulation-signal", "classifying", "best method and "interview-literature-text" (variables are sorted from the strongest).…”

Section: Inertia Distributionmentioning

confidence: 99%

See 1 more Smart Citation

A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management

Pireddu,

Bedini,

Lombardi

et al. 2024

Preprint

View full text Add to dashboard Cite

Increasingly, information technology facilitates the storage and management of data useful for risk analysis and event prediction. Studies on data extraction related to occupational health and safety are increasingly available; however, due to its variability, the construction sector warrants special attention. This review is conducted under the research programmes of the National Institute for Occupational Accident Insurance (Inail). Objectives: The research question focuses on identifying which data mining (DM) methods, among supervised, unsupervised, and others, are most appropriate to be applied to certain investigation objectives, types, and sources of data, as defined by the authors. Methods: Scopus and ProQuest were the main sources from which we extracted studies in the field of construction, published between 2014 and 2023. The eligibility criteria applied in the selection of studies, were based on the Preferred Reporting Items for Systematic Review and meta-analysis (PRISMA). For exploratory purposes, we applied hierarchical clustering, while for in-depth analysis, we use principal component analysis (PCA) and meta-analysis. Results: The search strategy based on the PRISMA eligibility criteria, provided us with 61 out of 2,234 potential articles, 202 observation, 91 methodologies, 4 survey purposes, 3 data sources, 7 data types, and 3 resource type. Cluster analysis and PCA organized the information included in the paper dataset into two dimensions and labels: "supervised methods, institutional dataset, and predictive and classificatory purposes" (correlation 0.97÷8.18E-01; p-value 7.67E-55÷1.28E-22) and the second, Dim2 "not-supervised methods; project, simulation, literature, text data; monitoring, decision-making processes; machinery and environment" (corr. 0.84÷0.47; p-value 5.79E-25÷3.59E-06). We answered the research question regarding which method, among supervised, unsupervised, or other, is most suitable for application to data in the construction industry. Conclusions: The meta-analysis provided an overall estimate of the better effectiveness of supervised methods (Odds Ratio = 0.71, Confidence Interval 0.53÷0.96) compared to not-supervised methods.

show abstract

Section: Cluster Analysismentioning

confidence: 99%

Section: Inertia Distributionmentioning

confidence: 99%

A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management

Pireddu,

Bedini,

Lombardi

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…The nodes that make up the output grid accommodate only one class type, but sometimes this does not happen. Therefore, an analysis of cluster purity is conducted to uniquely assign a single class to each cell in the map [49]. Purity is a metric for how much a cluster contains a single class (Equation ( 21)).…”

Section: Quality Of Self-organizing Mapmentioning

confidence: 99%

An Unsupervised Anomaly Detection Based on Self-Organizing Map for the Oil and Gas Sector

2023

View full text Add to dashboard Cite

Anomaly detection plays a crucial role in preserving industrial plant health. Detecting and identifying anomalies helps prevent any production system from damage and failure. In complex systems, such as oil and gas, many components need to be kept operational. Predicting which parts will break down in a time interval or identifying which ones are working under abnormal conditions can significantly increase their reliability. Moreover, it underlines how the use of artificial intelligence is also emerging in the process industry and not only in manufacturing. In particular, the state-of-the-art analysis reveals a growing interest in the subject and that most identified algorithms are based on neural network approaches in their various forms. In this paper, an approach for fault detection and identification was developed using a Self-Organizing Map algorithm, as the results of the obtained map are intuitive and easy to understand. In order to assign each node in the output map a single class that is unique, the purity of each node is examined. The samples are identified and mapped in a two-dimensional space, clustering all readings into six macro-areas: (i) steady-state area, (ii) water anomaly macro-area, (iii) air-water anomaly area, (iv) tank anomaly area, (v) air anomaly macro-area, (vi) and steady-state transition area. Moreover, through the confusion matrix, it is found that the algorithm achieves an overall accuracy of 90 per cent and can classify and recognize the state of the system. The proposed algorithm was tested on an experimental plant at Università Politecnica delle Marche.

show abstract

“…Algoritma K-Means bekerja pada atribut numerik dan juga mempartisi data ke sejumlah kelompok [20]. Algoritma K-Means dimulai dengan memilih angka K secara acak serta pengambilan sebagian populasi sejumlah K untuk dijadikan sebagai titik pusat awal [21].…”

Section: K-meansunclassified

Analisis Pengelompokan Gangguan TIK Pada Sistem Pencatatan Layanan Menggunakan Algoritma K-Means dan Metode Elbow

Arientawati,

Jumaryadi,

Wibowo

2023

View full text Add to dashboard Cite

Pusintek merupakan unit pengelola tugas dan fungsi terkait TIK di instansi pemerintahan di Indonesia. Salah satu tugasnya berkaitan dengan fungsi penanganan gangguan TIK yang dilaporkan oleh Unit Pengguna. Pencatatan gangguan TIK menggunakan sistem pencatatan layanan dan gangguan berbasis Customer Relationship Management (CRM). Tujuan dari penelitian ini adalah untuk mengelompokkan jumlah laporan gangguan pada Unit Pelapor dan membantu divisi teknis dalam menganalisis seberapa sering gangguan yang dilaporkan tiap unit dan kategori gangguannya sehingga divisi teknis dapat merencanakan dan mempersiapkan pengambilan keputusan terkait tindak lanjut penyelesaian gangguan TIK pada instansi Unit Pelapor. Metode Elbow dignakan dalam proses penentuan jumlah cluster gangguan yang optimal. Pengujian dilakukan dengan membagi data gangguan TIK dan menghitung cluster distance performance sampai dengan 7 cluster. Tools yang digunakan untuk pengujian menggunakan Rapidminer. Hasil pengujian memperlihatkan bahwa penggunaan k=3 akan menghasilkan cluster yang optimal untuk metode K-Means Clustering dan Elbow, dengan nilai Average Centroid Distance sebesar 69110.233 dan nilai Davies Bouldin sebesar 0.458.

show abstract

Reclust: an efficient clustering algorithm for mixed data based on reclustering and cluster validation

Cited by 4 publications

References 19 publications

A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management

A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management

An Unsupervised Anomaly Detection Based on Self-Organizing Map for the Oil and Gas Sector

Analisis Pengelompokan Gangguan TIK Pada Sistem Pencatatan Layanan Menggunakan Algoritma K-Means dan Metode Elbow

Contact Info

Product

Resources

About