2022
DOI: 10.1016/j.ipm.2022.102881
|View full text |Cite
|
Sign up to set email alerts
|

Estimation of missing values in astronomical survey data: An improved local approach using cluster directed neighbor selection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 44 publications
0
5
0
Order By: Relevance
“…However, an inter-class overlap may dampen the quality of this local approach, as compared to the clustering-oriented technique such as SingleClus. The same problem is also witnessed for the task of imputing missing values, where clustering information can be exploited to improve the accuracy of estimates of those missing ones [37,38]. Nonetheless, the use of a single clustering seen with SingleClus may overlook patterns exhibited in data under examination.…”
Section: Resultsmentioning
confidence: 96%
“…However, an inter-class overlap may dampen the quality of this local approach, as compared to the clustering-oriented technique such as SingleClus. The same problem is also witnessed for the task of imputing missing values, where clustering information can be exploited to improve the accuracy of estimates of those missing ones [37,38]. Nonetheless, the use of a single clustering seen with SingleClus may overlook patterns exhibited in data under examination.…”
Section: Resultsmentioning
confidence: 96%
“…One of which is the conventional thread that needs a data transformation process to extract discriminative features or attributes from images in a training dataset. See (Keerin & Boongoen, 2022) for examples of physics-based features used to deliver the profile of each image representing a new bright source. Then, they are extensively used to develop a conventional classifier such as a random forest or a single decision tree, artificial neural networks, k-nearest neighbours, and support vector machines, to name just a few (Sarker, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…In past years, several studies (Tabacolde et al, 2018a,b) have designed this quest as a binary classification problem, to which conventional machine learning models can be applied. The main difficulty to obtain an accurate prediction model is the curse of imbalance class, where the minority class can contribute as tiny as 0.5% of the whole data collection (Iam-On et al, 2016;Haixianga et al, 2017;Keerin & Boongoen, 2022). A small improvement is made through balancing cardinality of samples belonging to both classes, using either an oversampling like SMOTE (Chawla et al, 2002) or a clustering-based undersampling counterpart (Lin et al, 2018a).…”
Section: Introductionmentioning
confidence: 99%
“…Imputation entails using statistical or machine learning procedures to estimate missing values in a data set. While still not widely accepted in the proteomics community, imputation has been standard practice for decades for analysis of gene expression [3] and clinical and epidemiological data [4], and more recently astronomy [5,6] and single-cell transcriptomic data [7,8]. Imputation methods for proteomics data (Table 1) fall into three broad categories: "singlevalue replacement" methods, in which all missing values are filled in with a single replacement value; "local similarity" methods, which use statistical models to learn patterns of local similarity in the data, for example between subsets of similar peptides or runs; and "global similarity" methods, which learn broad patterns of similarity across all peptides and runs.…”
Section: Introductionmentioning
confidence: 99%