MLcps: Machine Learning Cumulative Performance Score for classification problems

Akshay, Akshay; Abedi, Masoud; Shekarchizadeh, Navid; Burkhard, Fiona C.; Katoch, Mitali; Bigger-Allen, Alexander; Adam, Rosalyn M.; Monastyrskaya, Katia; Gheinani, Ali Hashemi

doi:10.1101/2022.12.01.518728

Cited by 4 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All supporting data, which includes images used for training, validation, and testing [22], as well as the trained model weights [23], is available at zenodo.…”

Section: Availability Of Supporting Source Code and Requirementsmentioning

confidence: 99%

SpheroScan: A User-Friendly Deep Learning Tool for Spheroid Image Analysis

Akshay

Katoch²,

Abedi³

et al. 2023

Preprint

View full text Add to dashboard Cite

Background: In recent years, three-dimensional (3D) spheroid models have become increasingly popular in scientific research as they provide a more physiologically relevant microenvironment that mimics in vivo conditions. The use of 3D spheroid assays has proven to be advantageous as it offers a better understanding of the cellular behavior, drug efficacy, and toxicity as compared to traditional two-dimensional cell culture methods. However, the use of 3D spheroid assays is impeded by the absence of automated and user-friendly tools for spheroid image analysis, which adversely affects the reproducibility and throughput of these assays. Results: To address these issues, we have developed a fully automated, web-based tool called SpheroScan, which uses the deep learning framework called Mask Regions with Convolutional Neural Networks (R-CNN) for image detection and segmentation. To develop a deep learning model that could be applied to spheroid images from a range of experimental conditions, we trained the model using spheroid images captured using IncuCyte Live-Cell Analysis System and a conventional microscope. Performance evaluation of the trained model using validation and test datasets shows promising results. Conclusion: SpheroScan allows for easy analysis of large numbers of images and provides interactive visualization features for a more in-depth understanding of the data. Our tool represents a significant advancement in the analysis of spheroid images and will facilitate the widespread adoption of 3D spheroid models in scientific research. The source code and a detailed tutorial for SpheroScan are available at https://github.com/FunctionalUrology/SpheroScan.

show abstract

“…All supporting data, which includes images used for training, validation, and testing [22], as well as the trained model weights [23], is available at zenodo.…”

Section: Availability Of Supporting Source Code and Requirementsmentioning

confidence: 99%

SpheroScan: A User-Friendly Deep Learning Tool for Spheroid Image Analysis

Akshay

Katoch²,

Abedi³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…An archival copy of the code and supporting data is available via the GigaScience repository, GigaDB [ 35 ]. DOME-ML (Data, Optimisation, Model, and Evaluation in Machine Learning) annotations, supporting the current study, are available via the supporting data in GigaDB.…”

Section: Data Availabilitymentioning

confidence: 99%

MLcps: machine learning cumulative performance score for classification problems

Akshay,

Abedi,

Shekarchizadeh

et al. 2022

GigaScience

View full text Add to dashboard Cite

Background Assessing the performance of machine learning (ML) models requires careful consideration of the evaluation metrics used. It is often necessary to utilize multiple metrics to gain a comprehensive understanding of a trained model’s performance, as each metric focuses on a specific aspect. However, comparing the scores of these individual metrics for each model to determine the best-performing model can be time-consuming and susceptible to subjective user preferences, potentially introducing bias. Results We propose the Machine Learning Cumulative Performance Score (MLcps), a novel evaluation metric for classification problems. MLcps integrates several precomputed evaluation metrics into a unified score, enabling a comprehensive assessment of the trained model’s strengths and weaknesses. We tested MLcps on 4 publicly available datasets, and the results demonstrate that MLcps provides a holistic evaluation of the model’s robustness, ensuring a thorough understanding of its overall performance. Conclusions By utilizing MLcps, researchers and practitioners no longer need to individually examine and compare multiple metrics to identify the best-performing models. Instead, they can rely on a single MLcps value to assess the overall performance of their ML models. This streamlined evaluation process saves valuable time and effort, enhancing the efficiency of model evaluation. MLcps is available as a Python package at https://pypi.org/project/MLcps/.

show abstract

“…TPOT (26,27,28), FEDOT (19), Auto-Sklearn (22), GAMA (29), RECIPE (30), and ML-Plan (31)); (4) accessibility and ease of use; some AutoMLs are designed to be broadly accessible requiring little to no coding experience to implement and run (e.g. ALIRO (24), MLme (32), MLIJAR-supervised (33), H20-3 (34), STREAMLINE (35), and Auto-WEKA (36)), while others are designed primarily as a code library to facilitate building a customizable pipeline with automated elements (e.g. LAMA (37), FLAML (38), Hyperopt-sklearn (39), TransmorgrifAI (40), MLBox (41), Xcessiv (42)); (5) output focus; the aims of AutoML vary, with focus on either a single best optimized model/pipeline (e.g.…”

Section: Introductionmentioning

confidence: 99%

“…TPOT (28)), or a direct comparison of model performance across algorithms (e.g. STREAMLINE (35), MLme (32), and PYCARET (23)); (6) inclusion and automation of different possible elements of a complete end-to-end ML pipeline; with algorithm selection and hyperparameter optimization being most common; and (7) transparency in the documentation, i.e. to what degree are the available elements, options, and automations defined and validated.…”

Section: Introductionmentioning

confidence: 99%

“…ExSTraCS (51), a rule-based algorithm designed specifically to address the challenges of detecting and characterizing epistasis (52) and heterogeneous associations (53) in biomedical data), (4) conduct statistical significance comparisons (between algorithms and datasets), (5) collectively compare and contrast feature importance (FI) estimates across modelling algorithms, and (6) generate a comprehensive sharable summary report. With respect to overall AutoML design and goals, STREAMLINE currently is most closely related to MLIJAR-supervised (33) and MLme (32) AutoML tools.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

Urbanowicz

Zhang

Cui

et al. 2023

Genetic and Evolutionary Computation

View full text Add to dashboard Cite

Objective: While machine learning (ML) includes a valuable array of tools for analyzing biomedical data with multivariate and complex underlying associations, significant time and expertise is required to assemble effective, rigorous, comparable, reproducible, and unbiased pipelines. Automated ML (AutoML) tools seek to facilitate ML application by automating a subset of analysis pipeline elements. In this study we develop and validate a Simple, Transparent, End-to-end Automated Machine Learning Pipeline (STREAMLINE) and apply it to investigate the added utility of photography-based phenotypes for predicting obstructive sleep apnea (OSA); a common and underdiagnosed condition associated with a variety of health, economic, and safety consequences. Methods: STREAMLINE is designed to tackle biomedical binary classification tasks while (1) avoiding common mistakes, (2) accommodating complex associations and common data challenges, and (3) allowing scalability, reproducibility, and model interpretation. It automates the majority of established, generalizable, and reliably automatable aspects of an ML analysis pipeline while incorporating cutting edge algorithms and providing opportunities for human-in-the-loop customization. We present a broadly refactored and extended release of STREAMLINE, validating and benchmarking performance across simulated and real-world datasets. Then we applied STREAMLINE to evaluate the utility of demographics (DEM), self-reported comorbidities (DX), symptoms (SYM), and photography-based craniofacial (CF) and intraoral (IO) anatomy measures in predicting 'any OSA' or 'moderate/severe OSA' using 3,111 participants from Sleep Apnea Global Interdisciplinary Consortium (SAGIC). Results: Benchmarking analyses validated the efficacy of STREAMLINE across data simulations with increasingly complex patterns of association including epistatic interactions and genetic heterogeneity. OSA analyses identified a significant increase in ROC-AUC when adding CF to DEM+DX+SYM to predict 'moderate/severe' OSA. Additionally, a consistent but non-significant increase in PRC-AUC was observed with the addition of each subsequent feature set to predict 'any OSA', with CF and IO yielding minimal improvements. Conclusion: STREAMLINE is an effective, rigorous, transparent, and easy-to-use AutoML approach to a comparative ML analysis that adheres to best practices in data science. Application of STREAM-LINE to OSA data suggests that CF features provide additional value in predicting moderate/severe OSA, but neither CF nor IO features meaningfully improved the prediction of 'any OSA' beyond established demographics, comorbidity and symptom characteristics.Keywords automated machine learning • obstructive sleep apnea • data science • predictive modeling • craniofacial traits • intraoral anatomy user-specification of feature types (which cannot always be reliably automated) and one-hot-encoding of categorical features for modeling, (3) engineering of 'missingness features' to consider missingness as a potentially informati...

show abstract

MLcps: Machine Learning Cumulative Performance Score for classification problems

Cited by 4 publications

References 13 publications

SpheroScan: A User-Friendly Deep Learning Tool for Spheroid Image Analysis

SpheroScan: A User-Friendly Deep Learning Tool for Spheroid Image Analysis

MLcps: machine learning cumulative performance score for classification problems

STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

Contact Info

Product

Resources

About