2021
DOI: 10.3390/e23070869
|View full text |Cite
|
Sign up to set email alerts
|

Qualitative Data Clustering to Detect Outliers

Abstract: Detecting outliers is a widely studied problem in many disciplines, including statistics, data mining, and machine learning. All anomaly detection activities are aimed at identifying cases of unusual behavior compared to most observations. There are many methods to deal with this issue, which are applicable depending on the size of the data set, the way it is stored, and the type of attributes and their values. Most of them focus on traditional datasets with a large number of quantitative attributes. The multi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 29 publications
0
2
0
Order By: Relevance
“… 7 Like other centroid-based clustering algorithms, K-means is sensitive to outliers; the source datasets for this study were subjected to outlier imputation, defining observations in the top and bottom 1% of each continuous variable’s distribution as outliers, as previously described. 2 , 3 , 8 The optimal number of clusters was determined by calculating the within-cluster variance for a range of 1–9 clusters and identifying the inflection point at which a greater number of clusters (and attendant decrease in cluster sizes) would not substantially tighten the clusters (decrease the within-cluster sum of squares). Value of care, the primary outcome, was calculated as inverted observed-to-expected mortality ratios divided by median total costs and multiplied by a constant, as previously described.…”
Section: Methodsmentioning
confidence: 99%
“… 7 Like other centroid-based clustering algorithms, K-means is sensitive to outliers; the source datasets for this study were subjected to outlier imputation, defining observations in the top and bottom 1% of each continuous variable’s distribution as outliers, as previously described. 2 , 3 , 8 The optimal number of clusters was determined by calculating the within-cluster variance for a range of 1–9 clusters and identifying the inflection point at which a greater number of clusters (and attendant decrease in cluster sizes) would not substantially tighten the clusters (decrease the within-cluster sum of squares). Value of care, the primary outcome, was calculated as inverted observed-to-expected mortality ratios divided by median total costs and multiplied by a constant, as previously described.…”
Section: Methodsmentioning
confidence: 99%
“…Data quality is very important, as it is affected by the number of variables and the amount of data acquired, which can lead to information sparsity, especially in cases where the quality of the data appears to be poor [21]. In addition, process analysis allows for the observation of unusual activities and behaviors, which can lead to the detection of "outliers", alarm objects, and calls for intervention [22]. Given the power of the method, LA can therefore be a major feedback tool for educators and instructional designers to improve the learning experience [23].…”
Section: Related Workmentioning
confidence: 99%
“…The presence of artifacts in the signals, as well as the estimation of parameters without a priori signal processing, can lead to the generation of false alarms or isolation of false outliers, which is becoming a relevant topic in database processing [ 16 ]. Recording by a capacitive electrode in clinical conditions, while the subjects were sitting still, was analyzed to determine the possibility of clinical application [ 17 ].…”
Section: Introductionmentioning
confidence: 99%