Instance Selection in Text Classification Using the Silhouette Coefficient Measure

Dey, Debangana; Solorio, Thamar; Gómez, Manuel Montes y

doi:10.1007/978-3-642-25324-9_31

Cited by 10 publications

(4 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…I(x,y) represents the mutual information between x and y, H(x) and H(y) are the entropy of x and y. NMI is defined as shown in Eq. ( 14): SC is another evaluation index of clustering results, originally proposed by Peter J. Rousseeuw in 1986 37 . It combines the two factors of intra cluster and inter-cluster, which can be calculated as shown in Eqs.…”

Section: Tciamentioning

confidence: 99%

Weakly supervised label propagation algorithm classifies lung cancer imaging subtypes

Ren

Jia

Zhao

et al. 2023

Sci Rep

View full text Add to dashboard Cite

Aiming at the problems of long time, high cost, invasive sampling damage, and easy emergence of drug resistance in lung cancer gene detection, a reliable and non-invasive prognostic method is proposed. Under the guidance of weakly supervised learning, deep metric learning and graph clustering methods are used to learn higher-level abstract features in CT imaging features. The unlabeled data is dynamically updated through the k-nearest label update strategy, and the unlabeled data is transformed into weak label data and continue to update the process of strong label data to optimize the clustering results and establish a classification model for predicting new subtypes of lung cancer imaging. Five imaging subtypes are confirmed on the lung cancer dataset containing CT, clinical and genetic information downloaded from the TCIA lung cancer database. The successful establishment of the new model has a significant accuracy rate for subtype classification (ACC = 0.9793), and the use of CT sequence images, gene expression, DNA methylation and gene mutation data from the cooperative hospital in Shanxi Province proves the biomedical value of this method. The proposed method also can comprehensively evaluate intratumoral heterogeneity based on the correlation between the final lung CT imaging features and specific molecular subtypes.

show abstract

Section: Tciamentioning

confidence: 99%

Weakly supervised label propagation algorithm classifies lung cancer imaging subtypes

Ren

Jia

Zhao

et al. 2023

Sci Rep

View full text Add to dashboard Cite

show abstract

“…This preprocessing type is known as instance selection. The silhouette coefficient (Dey et al 2011) was used as the criterion for detecting potentially noisy signals:…”

Section: Instance Selectionmentioning

confidence: 99%

Detection of defective embedded bearings by sound analysis: a machine learning approach

Saucedo-Espinosa

Berrones

2014

J Intell Manuf

Self Cite

View full text Add to dashboard Cite

This paper describes a machine learning solution for the detection of defective embedded bearings in home appliances by sound analysis. The bearings are installed deep into the home appliances at the beginning of the production process and cannot be physically accessed once they are fully assembled. Before a home appliance is put to sale, it is turned on and passed through a sound-based sensor that produces an acoustic signal. Home appliances with defective embedded bearings are detected by analyzing such signals. The approached task is very challenging, mainly because there is a small number of sample signals and the noise level in the measurements is quite high. In fact, it is showed that the signal-to-noise ratio is high enough to mask important components when applying traditional Fourier decomposition techniques. Hence, a different approach is needed. Experimental results are reported on both laboratory and production line signals. Despite the difficulty of the task, these results are encouraging. Several classification methods were evaluated and most of them achieved acceptable performance. An interesting finding is that, among the classifiers that showed better performance, some methods are highly intuitive and easy to implement. These methods are generally preferred in industry. The proposed solution is being implemented by the company which motivated this study.

show abstract

“…This value is helpful in denoting the cohesiveness of the data in one cluster and the separation of data in one cluster from those in the other clusters. This coefficient has been used in text classification not only to analyze the quality of the clustering but also as a feature selection technique [Dey et al, 2011]. In clustering tasks, the SC is calculated for each of the documents in the clusters in order to evaluate the clustering solution.…”

Section: Weighting Schemementioning

confidence: 99%

Contributions to speech analytics based on speech recognition and topic identification

Correa¹,

David²

2016

View full text Add to dashboard Cite

Instance Selection in Text Classification Using the Silhouette Coefficient Measure

Cited by 10 publications

References 16 publications

Weakly supervised label propagation algorithm classifies lung cancer imaging subtypes

Weakly supervised label propagation algorithm classifies lung cancer imaging subtypes

Detection of defective embedded bearings by sound analysis: a machine learning approach

Contributions to speech analytics based on speech recognition and topic identification

Contact Info

Product

Resources

About