Different Data Mining Approaches Based Medical Text Data

Xiao, Wenke; Jing, Lijia; Xu, Yaxin; Zheng, Shu; Gan, Yanxiong; Wen, Chuanbiao

doi:10.1155/2021/1285167

Cited by 13 publications

(9 citation statements)

References 64 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Prediction accuracy of the final model was assessed using the testing data set, and the model was calibrated using the validation data set. Since we had binary outcomes, we also calculated the sensitivity, specificity, area under curve (AUC), positive/negative likelihood ratios and positive/negative predictive values for each model [37,59,60]. Agreement between observed and predicted records was measured using the Kappa statistic [61,62].…”

Section: Methodsmentioning

confidence: 99%

“…Variable selection for other models was based on either a significance test (P < 0.05) or relative importance score (<1%) [29]. In addition to using a separate data set for validation, to further reduce predictive bias and uncertainty (i.e., variance of performance estimates) we used 10-fold cross-validation (10-fold cv) for the training of all models except NNW, since we had a large training data set [29,53,58,59]. For aquatic habitat identification, we used both field-observed aquatic habitats (yes) and pseudo-habitats (no).…”

Section: Model Specification and Modeling Processmentioning

confidence: 99%

See 1 more Smart Citation

Mapping potential malaria vector larval habitats for larval source management: Introduction to multi-model ensembling approaches

Zhou

Lee

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

Mosquito larval source management (LSM) is a viable supplement to the currently implemented first-line malaria control tools for use under certain conditions for malaria control and elimination. Implementation of larval source management requires a carefully designed strategy and effective planning. Identification and mapping of larval sources is a prerequisite. Ensemble modeling is increasingly used for prediction modeling, but it lacks standard procedures. We proposed a detailed framework to predict potential malaria vector larval habitats using ensemble modeling, which includes selection of models, ensembling method and predictors; evaluation of variable importance; prediction of potential larval habitats; and assessment of prediction uncertainty. The models were built and validated based on multi-site, multi-year field observations and climatic/environmental variables. Model performance was tested using independent multi-site, multi-year field observations. Overall, we found that the ensembled model predicted larval habitats with about 20% more accuracy than the average of the individual models ensembled. Key larval habitat predictors were elevation, geomorphon class, and precipitation 2 months prior. Mapped distributions of potential malaria vector larval habitats showed different prediction errors in different ecological settings. This is the first study to provide a detailed framework for the process of multi-model ensemble modeling. Mapping of potential habitats will be helpful in LSM planning.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Model Specification and Modeling Processmentioning

confidence: 99%

Mapping potential malaria vector larval habitats for larval source management: Introduction to multi-model ensembling approaches

Zhou

Lee

Wang

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…International Data Corporation stated that unstructured data would make up 95% of all data worldwide in 2020, with a compound annual growth rate of 65% [30]. Due to the quality and usability concerns with large unstructured datasets, structured data are more relevant and valuable than unstructured or semi-structured data [31].…”

Section: Related Workmentioning

confidence: 99%

“…Analyzing this huge amount of medical data to extract meaningful knowledge or information is useful in the medical field for decision support, prevention, diagnosis, and treatment. However, processing vast amounts of multidimensional or raw data is a difficult and time-consuming operation [30] but is absolutely necessary for the advancement of science in general. This challenge has led to new standards for using data so that data are Findable, Accessible, Interoperable, and Reusable (FAIR) [38].…”

Section: Related Workmentioning

confidence: 99%

Big Data Bot with a Special Reference to Bioinformatics

Al-Omari¹,

Tawalbeh²,

Akkam³

et al. 2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

There are quintillions of data on deoxyribonucleic acid (DNA) and protein in publicly accessible data banks, and that number is expanding at an exponential rate. Many scientific fields, such as bioinformatics and drug discovery, rely on such data; nevertheless, gathering and extracting data from these resources is a tough undertaking. This data should go through several processes, including mining, data processing, analysis, and classification. This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention. This software simulates the extraction of data from web-based (point-and-click) resources or graphical user interfaces that cannot be accessed using command-line tools. The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides (AMP) sequences (46240 hits) from various MARVIN software panels, which can be later utilized to develop novel AMPs. Furthermore, for machine learning research, the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank. As a result, data collection from the web will become faster and less expensive, with no need for manual data extraction. The software is critical as a first step to preparing large datasets for subsequent stages of analysis, such as those using machine and deep-learning applications.

show abstract

“…With the continuous development of database and data mining technology, data mining is more and more used to mine medical databases e ciently [7] . The existing data mining technology application research shows that the model established by data mining has high accuracy [8] .…”

Section: Introductionmentioning

confidence: 99%

Predicting three-month fasting blood glucose and glycated hemoglobin of patients with type 2 diabetes based on multiple machine learning algorithms

Xue

Jiang

Liu

et al. 2022

Preprint

View full text Add to dashboard Cite

BackgroundType 2 diabetes is the type with the largest proportion of people with diabetes.With the progression of the disease, patients with type 2 diabetes mellitus will have different degrees of complications, which will seriously reduce the quality of life of the patients and bring a heavy economic burden to the patient's families. Therefore, establishing a predictive model for glycemic control in patients with type 2 diabetes mellitus is of great help in optimizing the treatment of type 2 diabetes mellitus and delaying disease progression. Design and Methods:A retrospective study was conducted on type 2 diabetes mellitus real-world medical data from 4 cities in Sichuan Province, China from January 2015 to December 2020, including basic patient information, medication status, laboratory results, dietary habits, exercise status, and the actual follow-up of the patient after treatment. After data preprocessing, data inputting, data sampling, and feature screening, 16 kinds of machine learning methods were used to construct fasting blood glucose prediction models and glycated hemoglobin prediction models for type 2 diabetes mellitus patients, and 5 prediction models with the best prediction performance were screened respectively. ResultsA total of 375,723 cases of type 2 diabetes mellitus patients were collected, 10,000 cases were included to establish the fasting blood glucose model, and 2,169 cases were established to establish the HbA1c model. The best prediction model both of fasting blood glucose and HbA1c nally obtained are realized by ensemble learning and modi ed random forest inputting, the AUC value are 0.819 and 0.970, respectively. The most important indicators of the fasting blood glucose and glycated hemoglobin prediction model were fasting blood glucose and glycated hemoglobin. Medication compliance, follow-up outcome, dietary habits, BMI, and waist circumference also had a greater impact onfasting blood glucose levels. But on the glycated hemoglobin level, laboratory indicators such as platelets, Serum creatinine, Aspartate Transaminase, Hemoglobin, etc. had more impact. ConclusionThe prediction accuracy of the models of the two blood glucose control indicators is high and has certain clinical applicability. Glycated hemoglobin and fasting blood glucose are mutually important predictors, and there is a close relationship between them.

show abstract

Different Data Mining Approaches Based Medical Text Data

Cited by 13 publications

References 64 publications

Mapping potential malaria vector larval habitats for larval source management: Introduction to multi-model ensembling approaches

Mapping potential malaria vector larval habitats for larval source management: Introduction to multi-model ensembling approaches

Big Data Bot with a Special Reference to Bioinformatics

Predicting three-month fasting blood glucose and glycated hemoglobin of patients with type 2 diabetes based on multiple machine learning algorithms

Contact Info

Product

Resources

About