2020
DOI: 10.1101/2020.02.28.970202
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data

Abstract: The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and re… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
6
1

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 42 publications
(65 reference statements)
1
6
1
Order By: Relevance
“…The selection of the right model parameters, such as kernel function for SVM, distance equation for kNN and number of splits for DT, appears to be a fundamental step for the classification process, since it can lead to a worsening of the algorithm performance. This result is in line with previous studies, in which it was demonstrated that the tuning of model parameters should be always conducted to avoid misclassifications [ 53 , 54 ]. After the right selection of the model parameters, by combining the results we can assess that l-SVM, q-SVM, f-kNN, m-DT and cx-DT meet all the criteria to be considered as optimum classifiers.…”
Section: Discussionsupporting
confidence: 92%
“…The selection of the right model parameters, such as kernel function for SVM, distance equation for kNN and number of splits for DT, appears to be a fundamental step for the classification process, since it can lead to a worsening of the algorithm performance. This result is in line with previous studies, in which it was demonstrated that the tuning of model parameters should be always conducted to avoid misclassifications [ 53 , 54 ]. After the right selection of the model parameters, by combining the results we can assess that l-SVM, q-SVM, f-kNN, m-DT and cx-DT meet all the criteria to be considered as optimum classifiers.…”
Section: Discussionsupporting
confidence: 92%
“…The Cre-Het control and GABA B1 R KO datasets were generated separately and, as such, exhibited stronger batch effects than the wild-type control and GABA B1 R KO datasets. For integrating the Cre-Het control and GABA B1 R KO datasets, DEWA ¨KSS was used (Tja ¨rnberg et al, 2020). Following standard pre-processing (normalization and log1p transformation), the optimal amount of principal components and k for the k-nearest neighbor graph was computed in DEWA ¨KSS over the BBKNN algorithm to integrate the conditions (Pola nski et al, 2020).…”
Section: Single Cell Rna Sequencing Dataset Integrationmentioning
confidence: 99%
“…Interestingly, we observed that even though SCALE correlation exceeded that of scBasset for baseline accessibility/expression, scBasset significantly outperforms SCALE when evaluated by differential accessibility/expression (p<7.25e-05). We hypothesize that SCALE’s reliance on cell-cell covariance encourages cells to be more similar to each other than they actually are and over-smooths (Tjarnberg et al, 2021; Ashuach et al, 2021). scBasset will be less prone to over-smoothing since each peak is considered only through its sequence.…”
Section: Resultsmentioning
confidence: 99%