Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality

Shetta, Omar; Niranjan, Mahesan

doi:10.1098/rsos.190714

Cited by 15 publications

(14 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Outliers in data can occur due to the variability in measurements, experimental errors, or noise [1], and the existence of outliers in data makes the analysis of data misleading and degrades the performance of machine learning algorithms [2, 3].Several techniques have been developed in the past to detect outliers in data [4][5][6]. The techniques for outlier detection can be broadly classified as methods based on: , and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”

mentioning

confidence: 99%

“…Several techniques have been developed in the past to detect outliers in data [4][5][6]. The techniques for outlier detection can be broadly classified as methods based on: , and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Unsupervised outlier detection in multidimensional data

Rehman

Belhaouari

2021

J Big Data

View full text Add to dashboard Cite

Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

Unsupervised outlier detection in multidimensional data

Rehman

Belhaouari

2021

J Big Data

View full text Add to dashboard Cite

show abstract

“…Some of the main projects and networks with which we have been involved are listed below: Networks, centers, services, and Projects: ITaaU: http://www.itutility.ac.uk/ Dial-a-Molecule Grand Challenge Network: http://generic.wordpress.soton.ac.uk/dial-a-molecule/ Digital Economy Network: https://digitaleconomynetwork.com/ Leverhulme Research Centre for Functional Materials Design: https://www.liverpool.ac.uk/leverhulme-research-centre/ UK National Crystallography Service: http://www.ncs.ac.uk/ CombeChem - http://www.combechem.org/ We are distilling our main experiences in running networks and will present online resources in the near future (look for How to Train your Network). For the scientific research perspective, we were able to rely on our formative experience with the CombeChem science project, 67 and our extensive combined background in machine learning, chemistry, philosophy, and working in interdisciplinary projects that spanned these areas.…”

Section: The Network +mentioning

confidence: 99%

The AI for Scientific Discovery Network+

et al. 2021

Self Cite

View full text Add to dashboard Cite

“…Normal distributions are often used for representing the real value random variables with unknown distributions [28] [29]. The joint probability density function of independent and identically normal distribution is given as: (12) where, is the standard deviation modeled differently in (15) and 17, is the mean of the random variable and N is the dimension of the data. Here, some functions based on the normal distribution to identify the outliers in a dataset are proposed.…”

Section: Joint Probability Density Estimation Using D-k-nnmentioning

confidence: 99%

“…A Several techniques have been developed in the past to detect outliers in data [4]- [6]. The techniques for outlier detection can be broadly classified as methods based on: (i) Clustering [7], (ii) Classification [8], (iii) Neighbor based [9], (iv) Statistical [10], (v) Information-Theoretic [11], and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”

Section: Introductionmentioning

confidence: 99%

Unsupervised Outlier Detection in Multidimensional Data

Rehman

Belhaouari

2021

Preprint

View full text Add to dashboard Cite

Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use only a single dimensional distance vector to detect the outliers, so irrespective of the data’s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.

show abstract

Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality

Cited by 15 publications

References 27 publications

Unsupervised outlier detection in multidimensional data

Unsupervised outlier detection in multidimensional data

The AI for Scientific Discovery Network+

Unsupervised Outlier Detection in Multidimensional Data

Contact Info

Product

Resources

About