2020
DOI: 10.1098/rsos.190714
|View full text |Cite
|
Sign up to set email alerts
|

Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality

Abstract: The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(14 citation statements)
references
References 27 publications
0
14
0
Order By: Relevance
“…Outliers in data can occur due to the variability in measurements, experimental errors, or noise [1], and the existence of outliers in data makes the analysis of data misleading and degrades the performance of machine learning algorithms [2, 3].Several techniques have been developed in the past to detect outliers in data [4][5][6]. The techniques for outlier detection can be broadly classified as methods based on: , and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…Outliers in data can occur due to the variability in measurements, experimental errors, or noise [1], and the existence of outliers in data makes the analysis of data misleading and degrades the performance of machine learning algorithms [2, 3].Several techniques have been developed in the past to detect outliers in data [4][5][6]. The techniques for outlier detection can be broadly classified as methods based on: , and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”
mentioning
confidence: 99%
“…Several techniques have been developed in the past to detect outliers in data [4][5][6]. The techniques for outlier detection can be broadly classified as methods based on: , and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”
mentioning
confidence: 99%
“…Some of the main projects and networks with which we have been involved are listed below: Networks, centers, services, and Projects: ITaaU: http://www.itutility.ac.uk/ Dial-a-Molecule Grand Challenge Network: http://generic.wordpress.soton.ac.uk/dial-a-molecule/ Digital Economy Network: https://digitaleconomynetwork.com/ Leverhulme Research Centre for Functional Materials Design: https://www.liverpool.ac.uk/leverhulme-research-centre/ UK National Crystallography Service: http://www.ncs.ac.uk/ CombeChem - http://www.combechem.org/ We are distilling our main experiences in running networks and will present online resources in the near future (look for How to Train your Network). For the scientific research perspective, we were able to rely on our formative experience with the CombeChem science project, 67 and our extensive combined background in machine learning, chemistry, philosophy, and working in interdisciplinary projects that spanned these areas.…”
Section: The Network +mentioning
confidence: 99%
“…Normal distributions are often used for representing the real value random variables with unknown distributions [28] [29]. The joint probability density function of independent and identically normal distribution is given as: (12) where, is the standard deviation modeled differently in (15) and 17, is the mean of the random variable and N is the dimension of the data. Here, some functions based on the normal distribution to identify the outliers in a dataset are proposed.…”
Section: Joint Probability Density Estimation Using D-k-nnmentioning
confidence: 99%
“…A Several techniques have been developed in the past to detect outliers in data [4]- [6]. The techniques for outlier detection can be broadly classified as methods based on: (i) Clustering [7], (ii) Classification [8], (iii) Neighbor based [9], (iv) Statistical [10], (v) Information-Theoretic [11], and (vi) Spectral methods [12]. The working of classification-based methods mostly relies on a confidence score, which is calculated by the classifier while making a prediction for the test observation.…”
Section: Introductionmentioning
confidence: 99%