Abstract. In this paper Schapire and Singer's AdaBoost.MH boosting algorithm is applied to the Word Sense Disambiguation (WSD) problem. Initial experiments on a set of 15 selected polysemous words show t h a t the boosting approach surpasses Naive B a yes and Exemplar{based approaches, which represent state{of{the{art accuracy on supervised WSD. In order to make boosting practical for a real learning domain of thousands of words, several ways of accelerating the algorithm by reducing the feature space are studied. The best variant, which w e c a l l LazyBoosting, is tested on a medium/large sense{tagged corpus containing 192,800 examples of the 191 most frequent a n d a m biguous English words. Again, boosting compares favourably to the other benchmark algorithms.
This paper describes a set of experiments carried out to explore the domain dependence of alternative supervised Word Sense Disambiguation algorithms. The aim of the work is threefold: studying the performance of these algorithms when tested on a different corpus from that they were trained on; exploring their ability to tune to new domains, and demonstrating empirically that the Lazy-Boosting algorithm outperforms state-of-theart supervised WSD algorithms in both previous situations.
An important problem to be addressed by diagnostic systems in industrial applications is the estimation of faults with incomplete observations. This work discusses different approaches for handling missing data, and performance of data-driven fault diagnosis schemes. An exploiting classifier and combined methods were assessed in TennesseeEastman process, for which diverse incomplete observations were produced. The use of several indicators revealed the trade-off between performances of the different schemes.Support vector machines (SVM) and C4.5, combined with k-nearest neighbourhood (kNN), produce the highest robustness and accuracy, respectively. Bayesian networks (BN) and centroid appear as inappropriate options in terms of accuracy, while Gaussian naïve Bayes (GNB) is sensitive to imputation values. In addition, feature selection was explored for further performance enhancement, and the proposed contribution index showed promising results. Finally, an industrial case was studied to assess informative level of incomplete data in terms of the redundancy ratio and generalize the discussion.
in Wiley InterScience (www.interscience.wiley.com).One of the main limitations of current plant supervisory control systems is the reliability and the correct management of simultaneous faults, which is crucial for supporting the plant operators' decision making. In this work, a MultiLabel approach that makes use of support vector machines as the learning algorithm is employed to arrange a novel fault diagnosis system (FDS). The FDS is trained to address a difficult control case study from industry widely studied in the literature, the Tennessee Eastman process. Successful results have been obtained when diagnosing up to four simultaneous faults. These results are very promising since they have been obtained by just using simple training sets consisting of single faults, thus proving a very high learning capacity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.