2021
DOI: 10.1155/2021/1285167
|View full text |Cite
|
Sign up to set email alerts
|

Different Data Mining Approaches Based Medical Text Data

Abstract: The amount of medical text data is increasing dramatically. Medical text data record the progress of medicine and imply a large amount of medical knowledge. As a natural language, they are characterized by semistructured, high-dimensional, high data volume semantics and cannot participate in arithmetic operations. Therefore, how to extract useful knowledge or information from the total available data is very important task. Using various techniques of data mining can extract valuable knowledge or information f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 64 publications
0
8
0
Order By: Relevance
“…Prediction accuracy of the final model was assessed using the testing data set, and the model was calibrated using the validation data set. Since we had binary outcomes, we also calculated the sensitivity, specificity, area under curve (AUC), positive/negative likelihood ratios and positive/negative predictive values for each model [37,59,60]. Agreement between observed and predicted records was measured using the Kappa statistic [61,62].…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Prediction accuracy of the final model was assessed using the testing data set, and the model was calibrated using the validation data set. Since we had binary outcomes, we also calculated the sensitivity, specificity, area under curve (AUC), positive/negative likelihood ratios and positive/negative predictive values for each model [37,59,60]. Agreement between observed and predicted records was measured using the Kappa statistic [61,62].…”
Section: Methodsmentioning
confidence: 99%
“…Variable selection for other models was based on either a significance test (P < 0.05) or relative importance score (<1%) [29]. In addition to using a separate data set for validation, to further reduce predictive bias and uncertainty (i.e., variance of performance estimates) we used 10-fold cross-validation (10-fold cv) for the training of all models except NNW, since we had a large training data set [29,53,58,59]. For aquatic habitat identification, we used both field-observed aquatic habitats (yes) and pseudo-habitats (no).…”
Section: Model Specification and Modeling Processmentioning
confidence: 99%
“…International Data Corporation stated that unstructured data would make up 95% of all data worldwide in 2020, with a compound annual growth rate of 65% [30]. Due to the quality and usability concerns with large unstructured datasets, structured data are more relevant and valuable than unstructured or semi-structured data [31].…”
Section: Related Workmentioning
confidence: 99%
“…Analyzing this huge amount of medical data to extract meaningful knowledge or information is useful in the medical field for decision support, prevention, diagnosis, and treatment. However, processing vast amounts of multidimensional or raw data is a difficult and time-consuming operation [30] but is absolutely necessary for the advancement of science in general. This challenge has led to new standards for using data so that data are Findable, Accessible, Interoperable, and Reusable (FAIR) [38].…”
Section: Related Workmentioning
confidence: 99%
“…With the continuous development of database and data mining technology, data mining is more and more used to mine medical databases e ciently [7] . The existing data mining technology application research shows that the model established by data mining has high accuracy [8] .…”
Section: Introductionmentioning
confidence: 99%