Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance

Hao, Degan; Zhang, Lei; Sumkin, Jules H.; Mohamed, Aly A.; Wu, Shandong

doi:10.1109/jbhi.2020.2974425

Cited by 34 publications

(22 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…H. Soleimani et al [ 29 ] employed the segmentation of pectoral muscle to locate cancerous regions in mammograms and used a deep learning algorithm for the classification. A deep learning data-driven-based approach was proposed [ 30 ] for automatic identification of cancerous regions from mammograms, which improved the performance of the classification. D. song et al [ 31 ] applied a deep neural network for breast cancer prognosis prediction from multidimensional data and achieved a specificity of 99%.…”

Section: Literature Reviewmentioning

confidence: 99%

Computer Vision-Based Microcalcification Detection in Digital Mammograms Using Fully Connected Depthwise Separable Convolutional Neural Network

Rehman

Pei

et al. 2021

Sensors

View full text Add to dashboard Cite

Microcalcification clusters in mammograms are one of the major signs of breast cancer. However, the detection of microcalcifications from mammograms is a challenging task for radiologists due to their tiny size and scattered location inside a denser breast composition. Automatic CAD systems need to predict breast cancer at the early stages to support clinical work. The intercluster gap, noise between individual MCs, and individual object’s location can affect the classification performance, which may reduce the true-positive rate. In this study, we propose a computer-vision-based FC-DSCNN CAD system for the detection of microcalcification clusters from mammograms and classification into malignant and benign classes. The computer vision method automatically controls the noise and background color contrast and directly detects the MC object from mammograms, which increases the classification performance of the neural network. The breast cancer classification framework has four steps: image preprocessing and augmentation, RGB to grayscale channel transformation, microcalcification region segmentation, and MC ROI classification using FC-DSCNN to predict malignant and benign cases. The proposed method was evaluated on 3568 DDSM and 2885 PINUM mammogram images with automatic feature extraction, obtaining a score of 0.97 with a 2.35 and 0.99 true-positive ratio with 2.45 false positives per image, respectively. Experimental results demonstrated that the performance of the proposed method remains higher than the traditional and previous approaches.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Computer Vision-Based Microcalcification Detection in Digital Mammograms Using Fully Connected Depthwise Separable Convolutional Neural Network

Rehman

Pei

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…However, since a relatively large amount of unstructured data, be it images, videos, or text responses, needs to be labeled or coded, human involvement is necessary. The coding quality can have a large impact on the accuracy of classifiers, depending on the relative number of miscodings [37,38]. Once again, a distinction must be made between incorrect codings where the label does not fit the data material and just disagreement between raters where a certain margin of interpretation allows varying codings (i.e., border cases).…”

Section: Gold Standard: Human Codingmentioning

confidence: 99%

shinyReCoR: A Shiny Application for Automatically Coding Text Responses Using R

Andersen

Zehner

2021

Psych

View full text Add to dashboard Cite

In this paper, we introduce shinyReCoR: a new app that utilizes a cluster-based method for automatically coding open-ended text responses. Reliable coding of text responses from educational or psychological assessments requires substantial organizational and human effort. The coding of natural language in responses to tests depends on the texts’ complexity, corresponding coding guides, and the guides’ quality. Manual coding is thus not only expensive but also error-prone. With shinyReCoR, we provide a more efficient alternative. The use of natural language processing makes texts utilizable for statistical methods. shinyReCoR is a Shiny app deployed as an R-package that allows users with varying technical affinity to create automatic response classifiers through a graphical user interface based on annotated data. The present paper describes the underlying methodology, including machine learning, as well as peculiarities of the processing of language in the assessment context. The app guides users through the workflow with steps like text corpus compilation, semantic space building, preprocessing of the text data, and clustering. Users can adjust each step according to their needs. Finally, users are provided with an automatic response classifier, which can be evaluated and tested within the process.

show abstract

“…As the name indicates, unsupervised learning based models work only with unlabelled data so no training phase is involved; whereas supervised techniques have the requirement of training over a large dataset often requiring costly data labelling [17]. Unsupervised algorithms most commonly attempt to discover a common pattern associated with the features being processed within the dataset [18].…”

Section: Proposed Approachmentioning

confidence: 99%

“…Homogeneity and Completeness are two critical characteristics of a cluster. V-measure, @ is given in (17). OE signifies the degree of weightage given to each of these two characteristics, and in this case it is '1' (equal weightage).…”

Section: E2c V-measurementioning

confidence: 99%

An Unsupervised Approach for Content-Based Clustering of Emails Into Spam and Ham Through Multiangular Feature Formulation

Karim¹,

Azam

Shanmugam

et al. 2021

IEEE Access

View full text Add to dashboard Cite

The rapid growth of spam email attacks and the inherent malicious dynamism within those attacks on a range of social, personal and business activities warrants an intelligent and automated anti-spam framework. Attempts like malware propagation, identity theft, sensitive data pilfering, monetary as well as reputational damage are sharply increasing, endangering the privacy of the victim. Current solutions that are rather incomplete when the multidimensional feature range of email, is taken into account. We believe a methodology based on Artificial Intelligence, especially unsupervised machine learning is the way forward. This research attempts to investigating the application of unsupervised learning for the clustering of Spam and Ham emails. The overall goal of the research is to develop an unsupervised framework that solely depends on unsupervised methodologies through a clustering approach that includes multiple algorithms, primarily using the email content (body) and the subject header. The clustering has been done on a novel binary dataset of 22,000 entries of ham and spam emails, composed of ten features (reduced from eleven to ten after the feature reduction). Seven out of these ten features are unique to this study, engineered to represent impactful analytical email characteristics from a multiangular point of view. Out of five different clustering algorithms investigated in this work, OPTICS produced the optimum clustering demonstrating a 0.26% higher average efficacy than its nearest performer DBSCAN. The average balanced accuracy for OPTICS and DBSCAN was found to be ≈75.76%.

show abstract

Inaccurate Labels in Weakly-Supervised Deep Learning: Automatic Identification and Correction and Their Impact on Classification Performance

Cited by 34 publications

References 27 publications

Computer Vision-Based Microcalcification Detection in Digital Mammograms Using Fully Connected Depthwise Separable Convolutional Neural Network

Computer Vision-Based Microcalcification Detection in Digital Mammograms Using Fully Connected Depthwise Separable Convolutional Neural Network

shinyReCoR: A Shiny Application for Automatically Coding Text Responses Using R

An Unsupervised Approach for Content-Based Clustering of Emails Into Spam and Ham Through Multiangular Feature Formulation

Contact Info

Product

Resources

About