Machine Learning Algorithm Validation

Maleki, Farhad; Muthukrishnan, N. Moorthy; Ovens, Katie; Reinhold, Caroline; Forghani, Reza

doi:10.1016/j.nic.2020.08.004

Cited by 74 publications

(42 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Here each group has a chance to select as the testing dataset at a time and select as the training dataset at k-1 times. In general, 10-fold CV is used [23], [24]. There are several metrics to measure the accuracy of the classifier built.…”

Section: Evaluation Of the Methodologymentioning

confidence: 99%

Determining Local Hematology Reference Ranges: A Data-driven Approach

Semini¹,

Caldera²

2021

IJACSA

View full text Add to dashboard Cite

Hematology is the study of blood, blood-forming organs, and blood diseases. Hematological tests such as Full Blood Count (FBC) can be used to diagnose a wide range of infections and diseases by comparing their results with the standard hematology reference (SHR) ranges. These ranges were established many years ago by considering the Caucasian population and all countries have used them until recent times to measure the healthiness of the people. But these reference ranges can be varied according to various reasons such as dietary habits, geographical location, climate, environmental factors, etc., and the use of them by all countries may not be correct. Many researchers have started research in finding Local Hematology Reference (LHR) ranges. Most of them used statistical analyses which have their limitations. Machine learning is a solution to overcome those limitations. Finding an approach to determine the LHR range based on machine learning techniques is the goal of this research. The dataset was generated using FBC test reports in Sri Lanka. The LHR range of WBC count of healthy adults in Sri Lanka is only addressed in this research. A difference between the SHR range of WBC and the LHR range of WBC is observed.

show abstract

Section: Evaluation Of the Methodologymentioning

confidence: 99%

Determining Local Hematology Reference Ranges: A Data-driven Approach

Semini¹,

Caldera²

2021

IJACSA

View full text Add to dashboard Cite

show abstract

“…While leavepair-out cross validation is considered to be a less biased approach for binary classification because it exhaustively tries every possible combination, leave-one-out cross validation is a common training-testing split in this line of research (Cohen and Pakhomov, 2020;de la Fuente Garcia et al, 2020;Luz et al, 2020). Even on very small datasets, leave-pair-out cross validation is computationally expensive (Maleki et al, 2020). In order to keep our work comparable with prior and future studies, we opted to use leave one out cross validation as the best method for maximizing the available data while reducing training bias and maintaining reproducibility (Pahikkala et al, 2008;Fraser et al, 2019;Maleki et al, 2020).…”

Section: Machine Learning Experimentsmentioning

confidence: 99%

“…Even on very small datasets, leave-pair-out cross validation is computationally expensive (Maleki et al, 2020). In order to keep our work comparable with prior and future studies, we opted to use leave one out cross validation as the best method for maximizing the available data while reducing training bias and maintaining reproducibility (Pahikkala et al, 2008;Fraser et al, 2019;Maleki et al, 2020).…”

Section: Machine Learning Experimentsmentioning

confidence: 99%

Language Impairment in Alzheimer’s Disease—Robust and Explainable Evidence for AD-Related Deterioration of Spontaneous Speech Through Multilingual Machine Learning

Lindsay

Tröger

König

2021

Front. Aging Neurosci.

View full text Add to dashboard Cite

Alzheimer’s disease (AD) is a pervasive neurodegenerative disease that affects millions worldwide and is most prominently associated with broad cognitive decline, including language impairment. Picture description tasks are routinely used to monitor language impairment in AD. Due to the high amount of manual resources needed for an in-depth analysis of thereby-produced spontaneous speech, advanced natural language processing (NLP) combined with machine learning (ML) represents a promising opportunity. In this applied research field though, NLP and ML methodology do not necessarily ensure robust clinically actionable insights into cognitive language impairment in AD and additional precautions must be taken to ensure clinical-validity and generalizability of results. In this study, we add generalizability through multilingual feature statistics to computational approaches for the detection of language impairment in AD. We include 154 participants (78 healthy subjects, 76 patients with AD) from two different languages (106 English speaking and 47 French speaking). Each participant completed a picture description task, in addition to a battery of neuropsychological tests. Each response was recorded and manually transcribed. From this, task-specific, semantic, syntactic and paralinguistic features are extracted using NLP resources. Using inferential statistics, we determined language features, excluding task specific features, that are significant in both languages and therefore represent “generalizable” signs for cognitive language impairment in AD. In a second step, we evaluated all features as well as the generalizable ones for English, French and both languages in a binary discrimination ML scenario (AD vs. healthy) using a variety of classifiers. The generalizable language feature set outperforms the all language feature set in English, French and the multilingual scenarios. Semantic features are the most generalizable while paralinguistic features show no overlap between languages. The multilingual model shows an equal distribution of error in both English and French. By leveraging multilingual statistics combined with a theory-driven approach, we identify AD-related language impairment that generalizes beyond a single corpus or language to model language impairment as a clinically-relevant cognitive symptom. We find a primary impairment in semantics in addition to mild syntactic impairment, possibly confounded by additional impaired cognitive functions.

show abstract

“…Finally, we used a Wilcoxon rank-sum test to assess if there was any significant difference between the performance of the models built based on scenario 2 and the performance of the models developed based on scenario 1. In order to build RF models, we followed the common practice for developing machine learning models [29]. To achieve an unbiased estimate of generalization error, 30% of the patients were randomly selected and set aside as the test group.…”

Section: Predictive Modeling Of Different Outcomes Using Machine Learningmentioning

confidence: 99%

“…Data from the remaining 70% were used for model development. The data partitioning in this paper was conducted in a stratified manner to preserve the distribution of samples for each endpoint In order to build RF models, we followed the common practice for developing machine learning models [29]. To achieve an unbiased estimate of generalization error, 30% of the patients were randomly selected and set aside as the test group.…”

Section: Predictive Modeling Of Different Outcomes Using Machine Learningmentioning

confidence: 99%

Site-Specific Variation in Radiomic Features of Head and Neck Squamous Cell Carcinoma and Its Impact on Machine Learning Models

Liu

Maleki

Muthukrishnan

et al. 2021

Cancers

Self Cite

View full text Add to dashboard Cite

Current radiomic studies of head and neck squamous cell carcinomas (HNSCC) are typically based on datasets combining tumors from different locations, assuming that the radiomic features are similar based on histopathologic characteristics. However, molecular pathogenesis and treatment in HNSCC substantially vary across different tumor sites. It is not known if a statistical difference exists between radiomic features from different tumor sites and how they affect machine learning model performance in endpoint prediction. To answer these questions, we extracted radiomic features from contrast-enhanced neck computed tomography scans (CTs) of 605 patients with HNSCC originating from the oral cavity, oropharynx, and hypopharynx/larynx. The difference in radiomic features of tumors from these sites was assessed using statistical analyses and Random Forest classifiers on the radiomic features with 10-fold cross-validation to predict tumor sites, nodal metastasis, and HPV status. We found statistically significant differences (p-value ≤ 0.05) between the radiomic features of HNSCC depending on tumor location. We also observed that differences in quantitative features among HNSCC from different locations impact the performance of machine learning models. This suggests that radiomic features may reveal biologic heterogeneity complementary to current gold standard histopathologic evaluation. We recommend considering tumor site in radiomic studies of HNSCC.

show abstract

Machine Learning Algorithm Validation

Cited by 74 publications

References 32 publications

Determining Local Hematology Reference Ranges: A Data-driven Approach

Determining Local Hematology Reference Ranges: A Data-driven Approach

Language Impairment in Alzheimer’s Disease—Robust and Explainable Evidence for AD-Related Deterioration of Spontaneous Speech Through Multilingual Machine Learning

Site-Specific Variation in Radiomic Features of Head and Neck Squamous Cell Carcinoma and Its Impact on Machine Learning Models

Contact Info

Product

Resources

About