2019
DOI: 10.1371/journal.pone.0208737
|View full text |Cite
|
Sign up to set email alerts
|

Computational prediction of diagnosis and feature selection on mesothelioma patient health records

Abstract: BackgroundMesothelioma is a lung cancer that kills thousands of people worldwide annually, especially those with exposure to asbestos. Diagnosis of mesothelioma in patients often requires time-consuming imaging techniques and biopsies. Machine learning can provide for a more effective, cheaper, and faster patient diagnosis and feature selection from clinical data in patient records.Methods and findingsWe analyzed a dataset of health records of 324 patients having mesothelioma symptoms from Turkey. The patients… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
45
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 54 publications
(46 citation statements)
references
References 62 publications
(74 reference statements)
0
45
0
Order By: Relevance
“…Similarly to what authors did for a dataset of patients having mesothelioma symptoms [92], we decided then to investigate the most important features of the cardiovascular heart disease patients dataset. To this aim, we first performed a traditional univariate biostatistics analysis ("Feature ranking" section), and then employed Random Forests [108], to generate machine learning results.…”
Section: Feature Ranking Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…Similarly to what authors did for a dataset of patients having mesothelioma symptoms [92], we decided then to investigate the most important features of the cardiovascular heart disease patients dataset. To this aim, we first performed a traditional univariate biostatistics analysis ("Feature ranking" section), and then employed Random Forests [108], to generate machine learning results.…”
Section: Feature Ranking Resultsmentioning
confidence: 99%
“…Regarding machine learning feature ranking, we focused only on Random Forests [72,91], because as it turned out to be the top performing classifier on the complete dataset ("Feature ranking results" section). Random Forests [72] provides two feature ranking techniques: mean accuracy reduction and Gini impurity reduction [92]. During training, Random Forests generates several random Decision Trees that it applies to data subsets, containing a subsets both of data instances and of features.…”
Section: Feature Rankingmentioning
confidence: 99%
See 1 more Smart Citation
“…The dataset consists of medicinal histories of 324 patients gathered from the University of California (Irvine, CA, USA) machine learning database [29]. There were 96 mesothelioma patients as well as 228 healthy individuals which indicated imbalanced dataset [30]. Regarding with imbalanced dataset, it includes 29.63% patients of mesothelioma and 70.37% healthy individuals.…”
Section: Datasetmentioning
confidence: 99%
“…The completeness of the dataset is exceptional quality in electronic medical records which enables to make a more exact and precise analysis than different conditions where a few values are missing [32]. The "diagnosis method" attribute has duplicate values to "class of diagnosis" [30] then attribute was removed from the data set.…”
Section: Datasetmentioning
confidence: 99%