BackgroundBreast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the survival indicators, however most of these analyses were predominantly performed using basic statistical methods. As an alternative, this study used machine learning techniques to build models for detecting and visualising significant prognostic indicators of breast cancer survival rate.MethodsA large hospital-based breast cancer dataset retrieved from the University Malaya Medical Centre, Kuala Lumpur, Malaysia (n = 8066) with diagnosis information between 1993 and 2016 was used in this study. The dataset contained 23 predictor variables and one dependent variable, which referred to the survival status of the patients (alive or dead). In determining the significant prognostic factors of breast cancer survival rate, prediction models were built using decision tree, random forest, neural networks, extreme boost, logistic regression, and support vector machine. Next, the dataset was clustered based on the receptor status of breast cancer patients identified via immunohistochemistry to perform advanced modelling using random forest. Subsequently, the important variables were ranked via variable selection methods in random forest. Finally, decision trees were built and validation was performed using survival analysis.ResultsIn terms of both model accuracy and calibration measure, all algorithms produced close outcomes, with the lowest obtained from decision tree (accuracy = 79.8%) and the highest from random forest (accuracy = 82.7%). The important variables identified in this study were cancer stage classification, tumour size, number of total axillary lymph nodes removed, number of positive lymph nodes, types of primary treatment, and methods of diagnosis.ConclusionInterestingly the various machine learning algorithms used in this study yielded close accuracy hence these methods could be used as alternative predictive tools in the breast cancer survival studies, particularly in the Asian region. The important prognostic factors influencing survival rate of breast cancer identified in this study, which were validated by survival curves, are useful and could be translated into decision support tools in the medical domain.
The reliable classification of benign and malignant lesions in breast ultrasound images can provide an effective and relatively low-cost method for the early diagnosis of breast cancer. The accuracy of the diagnosis is, however, highly dependent on the quality of the ultrasound systems and the experience of the users (radiologists). The use of deep convolutional neural network approaches has provided solutions for the efficient analysis of breast ultrasound images. In this study, we propose a new framework for the classification of breast cancer lesions with an attention module in a modified VGG16 architecture. The adopted attention mechanism enhances the feature discrimination between the background and targeted lesions in ultrasound. We also propose a new ensembled loss function, which is a combination of binary cross-entropy and the logarithm of the hyperbolic cosine loss, to improve the model discrepancy between classified lesions and their labels. This combined loss function optimizes the network more quickly. The proposed model outperformed other modified VGG16 architectures, with an accuracy of 93%, and also, the results are competitive with those of other state-of-the-art frameworks for the classification of breast cancer lesions. Our experimental results show that the choice of loss function is highly important and plays a key role in breast lesion classification tasks. Additionally, by adding an attention block, we could improve the performance of the model.
Pathology reports represent a primary source of information for cancer registries. University Malaya Medical Centre (UMMC) is a tertiary hospital responsible for training pathologists; thus narrative reporting becomes important. However, the unstructured free-text reports made the information extraction process tedious for clinical audits and data analysis-related research. This study aims to develop an automated natural language processing (NLP) algorithm to summarize the existing narrative breast pathology report from UMMC to a narrower structured synoptic pathology report with a checklist-style report template to ease the creation of pathology reports. The development of the rule-based NLP algorithm was based on the R programming language by using 593 pathology specimens from 174 patients provided by the Department of Pathology, UMMC. The pathologist provides specific keywords for data elements to define the semantic rules of the NLP. The system was evaluated by calculating the precision, recall, and F1-score. The proposed NLP algorithm achieved a micro-F1 score of 99.50% and a macro-F1 score of 98.97% on 178 specimens with 25 data elements. This achievement correlated to clinicians’ needs, which could improve communication between pathologists and clinicians. The study presented here is significant, as structured data is easily minable and could generate important insights.
Automated artificial intelligence (AI) systems enable the integration of different types of data from various sources for clinical decision-making. The aim of this study is to propose a pipeline to develop a fully automated clinician-friendly AI-enabled database platform for breast cancer survival prediction. A case study of breast cancer survival cohort from the University Malaya Medical Centre was used to develop and evaluate the pipeline. A relational database and a fully automated system were developed by integrating the database with analytical modules (machine learning, automated scoring for quality of life, and interactive visualization). The developed pipeline, iSurvive has helped in enhancing data management as well as to visualize important prognostic variables and survival rates. The embedded automated scoring module demonstrated quality of life of patients whereas the interactive visualizations could be used by clinicians to facilitate communication with patients. The pipeline proposed in this study is a one-stop center to manage data, to automate analytics using machine learning, to automate scoring and to produce explainable interactive visuals to enhance clinician-patient communication along the survivorship period to modify behaviours that relate to prognosis. The pipeline proposed can be modelled on any disease not limited to breast cancer.
Many efforts are currently underway around the world to improve public awareness about preventive measures and to disseminate appropriate information about COVID-19 in curbing the spread of the disease. This study aims to determine the level of awareness on control and prevention of COVID-19 among population in Malaysia. A cross-sectional study was conducted among 355 participants in between March 30th to May 21st, 2020. A set of questionnaire that consists of five main themes: (1) socio-demographics, (2) awareness, (3) knowledge, (4) attitudes, and (5) practices towards prevention and controlling COVID-19 were distributed via online using google forms. The overall Knowledge, Attitude and Practice (KAP) scores was analyzed based on Bloom’s cut-off point of 80%. The results of this study show that Malaysians’ awareness highly influence their knowledge, attitude, and practices in preventing and controlling COVID-19 spread. Although the results are reasonably good, it is recommended for further awareness to be undertaken to continuously raise the awareness level and to remove any negative stigma and attitude that consequently produce better practices to prevent the spread of this virus, so that Malaysia is capable of stopping the COVID-19 infectious virus. Virtual awareness programs should be conducted to provide the public with the most up-to-date information on infection control procedures and how to maintain a hygienic environment, as well as encourage people to adopt social distance and avoid social gatherings.
The practice of medical decision making is changing rapidly with the development of innovative computing technologies. The growing interest of data analysis with improvements in big data computer processing methods raises the question of whether machine learning can be integrated with conventional statistics in health research. To help address this knowledge gap, this paper presents a review on the conceptual integration between conventional statistics and machine learning, focusing on the health research. The similarities and differences between the two are compared using mathematical concepts and algorithms. The comparison between conventional statistics and machine learning methods indicates that conventional statistics are the fundamental basis of machine learning, where the black box algorithms are derived from basic mathematics, but are advanced in terms of automated analysis, handling big data and providing interactive visualizations. While the nature of both these methods are different, they are conceptually similar. Based on our review, we conclude that conventional statistics and machine learning are best to be integrated to develop automated data analysis tools. We also strongly believe that machine learning could be explored by health researchers to enhance conventional statistics in decision making for added reliable validation measures.
BACKGROUND: Breast cancer (BC) is the most common cancer in Malaysia, with many diagnosed at late stages. The “Know Your Lemons” (KYL) visual educational tools were developed by KYL Foundation. This study aimed to evaluate participants’ confidence levels and perceived knowledge in identifying BC symptoms before and after exposure to KYL tools. MATERIALS AND METHODS: A cross-sectional study was carried out among 788 participants in three KYL health campaigns from 2017 to 2020. Perceived knowledge (a 5-item Likert scale was used, zero means “very poor” and 4 means “excellent knowledge”) and confidence in identifying BC symptoms were studied. A Wilcoxon Matched-Paired Signed-Rank Test was performed to assess the perceived knowledge. RESULTS: There was a significant improvement in the perceived knowledge Mean (±SD) score (2.84 ± 1.02) versus (4.31 ± 0.66) before and after the campaign ( P < 0.01). About 95.6% agreed that the language used in KYL materials was clear and understandable, 89.8% agreed it is acceptable in Malaysian culture, and 80% felt more confident in identifying BC symptoms. Therefore, 90.8% had the intention of breast self-examination and 90.8% would consult a doctor if symptomatic. The majority (92.7%) agreed that the KYL tools clarified the BC tests needed. CONCLUSION: The KYL tools enhanced perceived BC symptom recognition knowledge and confidence levels.
Background: Breast cancer is one of the leading cause of mortality among women worldwide. The Breast Cancer Resource Centre (BCRC) of University Malaya Medical Centre (UMMC), Kuala Lumpur, Malaysia, started the Malaysian Breast Cancer Survivorship Cohort (MyBCC) study in 2012. Aim: As a further enhancement of the research, the MyBCC database has been developed to conduct the survey in a convenient way, which aims to predict the factors influencing different survival rate among patients from multiethnic origin using data science techniques. Methods: The database comprised of life style related data of the patients including demographic factors, information on other illness, clinical factors, quality of life, psychosocial support, physical activity, work related questions, depression score, family background, type of medication consumed and financial status of the patients. This paper presents an approach to build an automated workflow using the MySQL database management system and PHP, integrated with R and HTML for web display. Results: A relational database comprising 816 breast cancer patients' data were developed for the MyBCC cohort study. This database serves as the backend for the MyBCC application where researchers can register new patients' records, update and view the information of recruited patients by using the system in a more commodious environment than before. Besides, the MyBCC database has been integrated with R programming tool by deploying the RMySQL package to perform audits. A few important analysis using plotly package, leveraging the integration of R with database are presented. Conclusion: In this paper, the development of the MyBCC database is presented, with the aim to automate the manual process of data entry, storage and analysis for performing audits for the breast cancer cohort study. The integration of database with R for automated analysis of data are also shown using examples of predictions that can be generated using functions in R. This fully automated workflow reduces the workload and time taken in performing manual predictions using data sources stored in flat files.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.