2015
DOI: 10.1093/bioinformatics/btv176
|View full text |Cite
|
Sign up to set email alerts
|

KeBABS: an R package for kernel-based analysis of biological sequences

Abstract: KeBABS provides a powerful, flexible and easy to use framework for KE: rnel- B: ased A: nalysis of B: iological S: equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
51
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 48 publications
(51 citation statements)
references
References 8 publications
0
51
0
Order By: Relevance
“…In order to investigate the composition of large numbers of sequences with the appropriate dimensionality, sequence kernels are increasingly used [28,29]. Sequence kernels are high-dimensional functions which measure the similarity of pairs of sequences, for example, by comparing the occurrence of specific subsequences (k-mers) in a high-dimensional space [30,31]. Supervised machine learning (e.g., support vector machine analysis) is an approach, which takes low and high-dimensional feature functions as input to find a classification rule that discriminates between two (or more) given classes on a single-clone level (e.g., public vs. private clones) [32].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In order to investigate the composition of large numbers of sequences with the appropriate dimensionality, sequence kernels are increasingly used [28,29]. Sequence kernels are high-dimensional functions which measure the similarity of pairs of sequences, for example, by comparing the occurrence of specific subsequences (k-mers) in a high-dimensional space [30,31]. Supervised machine learning (e.g., support vector machine analysis) is an approach, which takes low and high-dimensional feature functions as input to find a classification rule that discriminates between two (or more) given classes on a single-clone level (e.g., public vs. private clones) [32].…”
Section: Introductionmentioning
confidence: 99%
“…In contrast to using conventional low-dimensional features to analyze immune repertoires, the coupling of high-dimensional sequence kernels to support vector machine (SVM) analysis may lead to greater insight into the immunogenomic structure of repertoire diversity; specifically the difference between public and private repertoires. As opposed to previous approaches [33], a key advantage of sequence-kernel based SVM analysis is the prediction-profile-based identification of CDR3 subregions that are most predictive for a respective class (public or private class) [30,31]. This approach may lead to predictive immunological and mechanistic insight into the immunogenomic elements that shape repertoire diversity.…”
Section: Introductionmentioning
confidence: 99%
“…We report all analyses with k = 5, but classifier performance and generalization were similar for k = 4-7 (Supplementary note). More flexible models, such as the mismatch (Leslie et al 2002;Palme et al 2015) and gappy pair kernels (Mahrenholz et al 2011;Bodenhofer et al 2009), did not significantly increase the performance (Supplementary note).…”
Section: Spectrum Kernel Svm Classificationmentioning
confidence: 97%
“…We used k-mer spectrum kernel to quantify sequence features for the SVM (Leslie et al 2002); reverse complements of k-mers are different k-mers in this study. Binary classification, evaluation, and calculation of feature weights were performed with the kebabs R package (Palme et al 2015). We report all analyses with k = 5, but classifier performance and generalization were similar for k = 4-7 (Supplementary note).…”
Section: Spectrum Kernel Svm Classificationmentioning
confidence: 99%
“…For each repertoire, the occurrence of gapped k-mers was calculated across all CDR3 nucleotide sequences for parameters (k = 3, m ≤ 3, where k is the k-mer amino acid length and m is the number of amino acid gaps), as described by Palme et al , 2015. The gapped-kmers are counted for the core of the CDR3 only, excluding the first three and last two amino acids containing more conserved patterns.…”
Section: Supplementary Datamentioning
confidence: 99%