The data clustering with automatic program such as k-means has been a popular technique widely used in many general applications. Two interesting sub-activity of clustering process are studied in this paper, selection the number of clusters and analysis the result of data clustering. This research aims at studying the clustering validation to find appropriate number of clusters for k-means method. The characteristics of experimental data have 3 shapes and each shape have 4 datasets (100 items), which diffusion is achieved by applying a Gaussian distributed (normal distribution). This research used two techniques for clustering validation: Silhouette and Sum of Squared Errors (SSE). The research shows comparative results on data clustering configuration k from 2 to 10. The results of both Silhouette and SSE are consistent in the sense that Silhouette and SSE present appropriate number of clusters at the same k-value (Silhouette value: maximum average, SSE-value: knee point).
Image data are normally unstructured and high dimensional due to the photography technology advancement such that an image can be taken at a wide range of resolution levels. To overcome such problem, data miners may consider selecting only a minimal set of features that are really important for classifying their images. Feature selection is a popular method for reducing dimensions in data. However, most feature selection algorithms return results in form of score for each feature. It is still difficult for data miners to choose features based on such scoring scheme because they may not know which score range is the best for their data classification at hand. Therefore, in this research, we aim to assist data miners and novice data analysts on solving dimensionality problem by finding for them the best optimal set of features, instead of just reporting the scores of all features and leaving the selection step to be the burden of miners. We select optimal set of features by firstly apply clustering technique to group similar features based on their scores. We thus propose the silhouette width criterion for selecting the optimal number of clusters during the cluster analysis step. After that we perform association mining to analyze relationships that may exist among different subsets of features toward the target attribute. Our method finally reports user the best subset of features to be potentially used further for data classification. We demonstrate performance of our proposed method on the satellite forest image data in Japan.
The semiconductor industry deals with the production at a scale of nanometer, thus resulting in the process control with little margin of error. Timely detection of faults during the manufacturing process is critical to the improvement in product yields. Difficulty of detecting accurately faulty processes and products is due to the abundant of data obtained from hundreds of tool-state and process-state sensors. We thus analyze this problem through the computational intelligence techniques. The analysis results reveal the minimal set of features for fault detection as well as the high precision classification model of faults.
Various sub-tasks on modern construction management system require automatic or semi-automatic processes in handling the operation inside. Especially for construction progress monitoring task, the automatic process in classifying the difference of each construction material from an image is necessary in the preliminary stage. The more the preciseness in automatic classifying, the more the exactness in assessment of each material had been used. Subsequently, the progress of the construction can be evaluated with the highest degree of reliability. As a result, classification of construction material images is very essential process for automatic progress monitoring. Whereas, the similarities in material image appearances are the major classifying challenges. All most all existing related works have been studied based on hand-designed features of which the classified accuracy still not much appreciated from different studied datasets. In our work, automatic feature extracted method from the prominent technique in deep learning, convolution neural network (CNN), is proposed. The pre-trained CNN architectures of AlexNet and GoogleNet are adopt with the task of construction material images classification in the concept of transfer learning. Both of fixed feature extractor and fine-tuning schemes of transfer learning are technically implemented and evaluated. Analyzing results from the two pre-trained architectures expose very impressive and interesting circumstances to the studied dataset. Entirely, fine-tuning scheme of GoogleNet reveals the highest classification result by 95.50 percent of accuracy. Index Terms-Convolution neural network (CNN), deep learning, transfer learning, construction material, image classification.
Deep analyses of electrocardiogram (ECG) signals can reveal hidden information that can be potentially useful for the accurate diagnosis of heart diseases. Time series data of ECGs are usually high dimensional and complex in their components. One of the key successes for this kind of learning is to learn from the representative data. In this research, we present Deep Autoencoder Networks (DANs) for efficient casting of time series representatives. To determine the appropriate DAN structure, we use genetic algorithms (GAs). ECG representatives are then clustered. The clustering results obtained from our proposed method are compared with those obtained using other time series representation techniques. This comparison is based on the grouping accuracy involving the correct data label and cluster purity. The experimental results show that we can cast for appropriate ECG representatives that yield better performance with regard to time series clustering with 30% improvement in grouping accuracy and 23% increase in the purity metric.
Speaker recognition approach can be categorized into speaker identification and speaker verification. These two subfields have a bit varied in definition from domain usage. If we has a voice input, the goal of speaker verification is for authentication by determining an answer from a question: "is the voice someone's voice?" For speaker identification, will try to find an answer: "the voice is whose voice?" It can be thought that verification is a special case of open-set identification. In this work, deep learning model using a convolution neural network (CNN) for speaker identification is proposed. The voice input to the method is no constrained on the words the speaker speaks. That means it is in a form of text-independent of which more difficult than text-dependent system. By the method, each 2 seconds of the speaker voice is transform to a spectrogram image and input to the generated CNN model training from scratch. The proposed CNN based method is compared to the classic Mel-frequency cepstral coefficients (MFCCs) based featured extraction method classified by support vector machine (SVM). Where, up to date, MFCC is the most popular feature extracted method for audio and speech signal. Our proposed method that the spectrogram image is used as an input is also compared to a case when image of raw signal wave is employed to the CNN model. Experiments are conducted on the speech from five speakers speak in Thai language of which voices are extracted from YouTube. It reveals the proposed CNN based method trains on spectrogram image of voice is the best compared to the other two methods. The average classification results of the testing set by the proposed method is 95.83%. For MFCC based method is 91.26% and for CNN model trained on image of raw signal wave is only 49.77%. The proposed method is very efficient when only short utterance of voice is used as an input.Index Terms-Convolution neural network (CNN), deep learning, speaker recognition, speaker identification, text-independent.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.