Evaluation of Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values

Somasundaram, Rajeswari; Nedunchezhian, R.

doi:10.5120/2619-3544

Cited by 40 publications

(26 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…If such values are not calculated by using appropriate means or incomplete records are not removed, they can cause poor analytical results [6, 7]. A number of methods can be used to replace the missing values by using measures like mean and median [51]. …”

Section: Challenges In Implementing Data Mining Process For Clinicmentioning

confidence: 99%

Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

Noaman

Nadeem

Ragab

et al. 2017

BioMed Research International

View full text Add to dashboard Cite

Prediction of nosocomial infections among patients is an important part of clinical surveillance programs to enable the related personnel to take preventive actions in advance. Designing a clinical surveillance program with capability of predicting nosocomial infections is a challenging task due to several reasons, including high dimensionality of medical data, heterogenous data representation, and special knowledge required to extract patterns for prediction. In this paper, we present details of six data mining methods implemented using cross industry standard process for data mining to predict central line-associated blood stream infections. For our study, we selected datasets of healthcare-associated infections from US National Healthcare Safety Network and consumer survey data from Hospital Consumer Assessment of Healthcare Providers and Systems. Our experiments show that central line-associated blood stream infections (CLABSIs) can be successfully predicted using AdaBoost method with an accuracy up to 89.7%. This will help in implementing effective clinical surveillance programs for infection control, as well as improving the accuracy detection of CLABSIs. Also, this reduces patients' hospital stay cost and maintains patients' safety.

show abstract

Section: Challenges In Implementing Data Mining Process For Clinicmentioning

confidence: 99%

Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

Noaman

Nadeem

Ragab

et al. 2017

BioMed Research International

View full text Add to dashboard Cite

show abstract

“…We will show the accuracy of our algorithm based on the evaluation standard of RandIndex [20]. For uncertain data set D (includes N objects), let T={T 1 , T 2 , …, T k } represent the original clusters, and C={C 1 , C 2 , …, C m } be the clusters produced by a clustering algorithm.…”

Section: A Evaluation Standardmentioning

confidence: 99%

A Sketch-based Clustering Algorithm for Uncertain Data Streams

Chen¹,

Chen²,

Sheng³

2013

JNW

View full text Add to dashboard Cite

show abstract

“…Three algorithms have been presented below, that compute missing values and their attributes (M) in dataset (D) (Somasundaram and Nedunchezhian 2011).…”

Section: Algorithms For Computation Of Missing Valuesmentioning

confidence: 99%

“…2 nd approach of mean attribute value substitution method is time consuming & expensive, but gives best results for missing values problem. 3 rd approach of random attribute value substitution method causes distortion in data distributions by assuming that all missing values are with the same value, however this method still manages to provide comparable results (Somasundaram and Nedunchezhian 2011). Weak point of these techniques is the need of strong model assumptions.…”

Section: Introductionmentioning

confidence: 99%

“…These algorithms fill the missing values and smooth out the noise. Three of those implemented algorithms are Constant Substitution, Mean attribute value substitution and Random attribute value substitution method (Somasundaram and Nedunchezhian 2011). These methods have been tested on the standard WDBC dataset and their performance has been compared on the basis of defined evaluation attributes.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Data Cleaning In Data Warehouse: A Survey of Data Pre-processing Techniques and Tools

Fatima¹,

Nazir²,

Khan³

2017

IJITCS

View full text Add to dashboard Cite

Abstract-A Data Warehouse is a computer system designed for storing and analyzing an organization's historical data from day-to-day operations in Online Transaction Processing System (OLTP). Usually, an organization summarizes and copies information from its operational systems to the data warehouse on a regular schedule and management performs complex queries and analysis on the information without slowing down the operational systems. Data need to be pre-processed to improve quality of data, before storing into data warehouse. This survey paper presents data cleaning problems and the approaches in use currently for preprocessing. To determine which technique of preprocessing is best in what scenario to improve the performance of Data Warehouse is main goal of this paper. Many techniques have been analyzed for data cleansing, using certain evaluation attributes and tested on different kind of data sets. Data quality tools such as YALE, ALTERYX, and WEKA have been used for conclusive results to ready the data in data warehouse and ensure that only cleaned data populates the warehouse, thus enhancing usability of the warehouse. Results of paper can be useful in many future activities like cleansing, standardizing, correction, matching and transformation. This research can help in data auditing and pattern detection in the data.

show abstract

Evaluation of Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values

Cited by 40 publications

References 18 publications

Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

A Sketch-based Clustering Algorithm for Uncertain Data Streams

Data Cleaning In Data Warehouse: A Survey of Data Pre-processing Techniques and Tools

Contact Info

Product

Resources

About