2022
DOI: 10.32604/cmc.2022.020261
|View full text |Cite
|
Sign up to set email alerts
|

Improved KNN Imputation for Missing Values in Gene Expression Data

Abstract: The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics, especially the analysis of gene expression data that facilitates an early detection of cancer. Many attempts show improvements made by excluding samples with missing information from the analysis process, while others have tried to fill the gaps with possible values. While the former is simple, the latter safeguards information loss. For that, a neighbour-based (KNN) approach has proven more … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 49 publications
1
6
0
1
Order By: Relevance
“…An insurance dataset was obtained and input into R-studio with variables (Third-party, comprehensive, marine, and Fire + stolen) clearly defined. Running a series of pattern extraction and analytical functions in R-studio detected missing values from the dataset with 89.5% classification accuracy as summarized in Figure 13; The findings from this study are consistent with the results from similar studies [28], [29] 383 performance in the experimental replacement of numerical values due to its unique ability to classify the missing parameters and assign cluster ratios for each type unlike other techniques that perform replacement in whole datasets based on the normalized computation of mean absolute errors and root mean square error. A study [31] observes that imputation based on computational and statistical models is recommended by scientists due to its unique ability to determine the missing values by averaging a summarized likelihood function of the entire dataset over a mathematically defined predictive distribution with considerably high precision.…”
Section: Resultssupporting
confidence: 88%
“…An insurance dataset was obtained and input into R-studio with variables (Third-party, comprehensive, marine, and Fire + stolen) clearly defined. Running a series of pattern extraction and analytical functions in R-studio detected missing values from the dataset with 89.5% classification accuracy as summarized in Figure 13; The findings from this study are consistent with the results from similar studies [28], [29] 383 performance in the experimental replacement of numerical values due to its unique ability to classify the missing parameters and assign cluster ratios for each type unlike other techniques that perform replacement in whole datasets based on the normalized computation of mean absolute errors and root mean square error. A study [31] observes that imputation based on computational and statistical models is recommended by scientists due to its unique ability to determine the missing values by averaging a summarized likelihood function of the entire dataset over a mathematically defined predictive distribution with considerably high precision.…”
Section: Resultssupporting
confidence: 88%
“…However, an inter-class overlap may dampen the quality of this local approach, as compared to the clustering-oriented technique such as SingleClus. The same problem is also witnessed for the task of imputing missing values, where clustering information can be exploited to improve the accuracy of estimates of those missing ones [37,38]. Nonetheless, the use of a single clustering seen with SingleClus may overlook patterns exhibited in data under examination.…”
Section: Resultsmentioning
confidence: 96%
“…The missing values are only computed from the instance subset that is a high correlation with the sample that contains the missing values [15]. The k-nearest neighbor imputation (KNNimpute) and local least square imputation (LLSimpute) are widely used existing imputation methods are among this approach category [8], [19][20]. For The KNNimpute, this method has performed k-nearest neighbor algorithms depending on k number of high sample correlation with gene contained missing value to compute missing data in the dataset.…”
Section: Table I Missing Data Imputation Algorithms Categorized Into ...mentioning
confidence: 99%
“…However, this method is not a suitable solution for some datasets which consist of many incomplete values. Although the machine learning algorithms were exploited in numerous estimation applications for the time ahead prediction and objective classification, various up-to-date imputation methods were also proposed to handle this problem effectively via using convenient machine learning algorithms such as the regression method [8], the k-nearest neighbor method [9], deep learning approach [10][11], the neural network-based method [12] with advanced statistics strategies [13], [14]. The most appropriate value estimation predicted by these imputations used incompatible algorithms.…”
Section: Introductionmentioning
confidence: 99%