INTRODUCTION: Breast cancer is the most hazardous disease among women worldwide. A simple, cost-effective, and efficient screening called mammographic imaging is used to find the breast abnormalities to detect breast cancer in the early stages so that the patient's health can be improved. OBJECTIVES: The main challenge is to extract the features by using a novel technique called Advanced Gray-Level Cooccurrence Matrix (AGLCM) from pre-processed images and to classify the images using machine learning algorithms. METHODS: To achieve this, we proposed a four-step process: image acquisition, pre-processing, feature extraction, and classification. Initially, a pre-processing technique called Contrast Limited Advanced Histogram Equalization (CLAHE) is used to increase the contrast of images and the features are retrieved using AGLCM which extracts texture, intensity and shape-based features as these are important to identify the abnormality. RESULTS: In our framework, a classifier called eXtreme Gradient Boosting (XGBoost) is applied on mammograms and the results are compared with other classifiers such as Random Forest (RF), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN), and Support Vector Machine (SVM). The experiments are done on the Mammographic Image Analysis Society (MIAS) dataset. CONCLUSION: The outcome achieved with CLAHE+ AGLCM+ XGBoost classifier is better than the existing methods.In future, we experiment on large datasets and also concentrate on optimal features selection to increase the classification.
TREC text documents are complex to analyze the features its relevant similar documents using the traditional document similarity measures. As the size of the TREC repository is increasing, finding relevant clustered documents from a large collection of unstructured documents is a challenging task. Traditional document similarity and classification models are implemented on homogeneous TREC data to find essential features for document entities that are similar to the TREC documents. Also, most of the traditional models are applicable to limited text document sets for text analysis. The main issues in the traditional text mining models in TREC repository include :1) Each document is represented in vector form with many sparsity values 2) Failed to find the document semantic similarity between the intra and inter clusters 3) High mean squared error rate. In this paper, novel feature selection based clustered and classification model is proposed on large number of different TREC repositories. Traditional latent Semantic Indexing and document clustering models are failed to find the topic relevance on large number of TREC clinical text document sets due to computational memory and time. Proposed document feature selection and clustered based classification model is applied on TREC clinical benchmark datasets. From the experimental results, it is proved that the proposed model is efficient than the existing models in terms of computational memory, accuracy and error rate are concerned.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.