“…These methods may be particularly interesting when dealing with the difficulty to interpret large data sets, where manual interpretation and labeling would be of high cost. Semi-supervised learning refers to the use of labeled and unlabeled data within the learning process [18,35].…”
Section: Fuzzy Clustering: Original Fuzzy C-means and Semi-supervisedmentioning
In order to to predict regime duration in a given chaotic system, for a set of output prototypes are available, we propose to use a clustering technique for the definition of classes of regime duration, which are then used by a chosen classifier. In this way, the exact boundaries between classes are allowed to emerge from the data, as long as prototypical values fall in distinct classes. We investigate the use of both unsupervised and semi-supervised fuzzy clustering techniques FCM and ssFCM, as well as the traditional k-Means technique. To classify the data, we use neuro-fuzzy system ANFIS and two decision trees (J48 and NBTree). We apply the procedure on the well-known Lorenz strange attractor, having bred vector counts as input variables.
“…These methods may be particularly interesting when dealing with the difficulty to interpret large data sets, where manual interpretation and labeling would be of high cost. Semi-supervised learning refers to the use of labeled and unlabeled data within the learning process [18,35].…”
Section: Fuzzy Clustering: Original Fuzzy C-means and Semi-supervisedmentioning
In order to to predict regime duration in a given chaotic system, for a set of output prototypes are available, we propose to use a clustering technique for the definition of classes of regime duration, which are then used by a chosen classifier. In this way, the exact boundaries between classes are allowed to emerge from the data, as long as prototypical values fall in distinct classes. We investigate the use of both unsupervised and semi-supervised fuzzy clustering techniques FCM and ssFCM, as well as the traditional k-Means technique. To classify the data, we use neuro-fuzzy system ANFIS and two decision trees (J48 and NBTree). We apply the procedure on the well-known Lorenz strange attractor, having bred vector counts as input variables.
“…Semisupervised learning method has advantage of both supervised learning and unsupervised learning, it becomes researching hotspot and has been applied in different areas [5][6]. For the drawback of FCM algorithm, considering the actual situation of intrusion detection system ,using fuzzy clustering with the supervised information, initializing cluster with labeled known data, then improving clustering process by restriction of little known information and a lot of unlabeled data, this is semi-supervised Fuzzy Clustering algorithm [7][8].…”
Section: Intrusion Detection Algorithm Based On Semi -Supervised mentioning
In order to overcome the shortage that intrusion detection system is sensitive to outlier, we propose an intrusion detection algorithm based on semi-supervised fuzzy clustering. In this algorithm, the training data for semi-supervised learning is a hybrid data of labeled and unlabeled samples. While training the system model, we use a few labeled samples and many unlabeled samples as seeds initializing the classifier of the system. Under the constraint of labeled data, we use fuzzy C-Means method to generate clusters without many labeled data and uneasily plunges locally optima. Comparing with FCM algorithm, the experiment results on data sets KDD CUP 99 has shown the effectiveness of the proposed algorithm, it has higher detection rate and lower false detection rate.
“…In cases where the required amount of labelled samples cannot be provided, the learning system commonly fails. On the other hand, in unsupervised learning, the result strongly depends on prior assumptions and appropriate choice of, e.g., distance measure, distribution function, and expected number of classes/clusters [3]. The disadvantages of supervised and unsupervised learning lead researchers to semisupervised learning which is actually the half way between the supervised and unsupervised approaches.…”
In multiword expressions (MWEs), multiple words unite to build a new unit in language. When MWE identification is accepted as a binary classification task, one of the most important factors in performance is to train the classifier with enough number of labelled samples. Since manual labelling is a time-consuming task, the performances of MWE recognition studies are limited with the size of the training sets. In this study, we propose the comparison-based and common-decision co-training approaches in order to enlarge the MWE dataset. In the experiments, the performances of the proposed approaches were compared to those of the standard co-training [1] and manual labelling where statistical and linguistic features are employed as two different views of the MWE dataset [2]. A number of tests with different settings were performed on a Turkish MWE dataset. Ten different classifiers were utilized in the experiments and the best performing classifier pair was observed to be the SMO-SMO pair. The experimental results showed that the common-decision co-training approach is an alternative to hand-labeling of large MWE datasets and both newly proposed approaches outperform the standard co-training [2] when the training set is to be enlarged in MWE classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.