Proceedings of the 2012 SIAM International Conference on Data Mining 2012
DOI: 10.1137/1.9781611972825.75
|View full text |Cite
|
Sign up to set email alerts
|

Generalized Similarity Kernels for Efficient Sequence Classification

Abstract: Kernel-based approaches for sequence classification have been successfully applied to a variety of domains, including the text categorization, image classification, speech analysis, biological sequence analysis, time series and music classification, where they show some of the most accurate results.Typical kernel functions for sequences in these domains (e.g., bag-of-words, mismatch, or subsequence kernels) are restricted to discrete univariate (i.e. one-dimensional) string data, such as sequences of words in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 26 publications
(24 citation statements)
references
References 24 publications
0
24
0
Order By: Relevance
“…Many previous studies proved the kernel-based methods for string representations to be useful in practical string data analysis. See, for example, [26,[43][44][45][46][47][48][49][50][51][52] for important recent studies on the kernel-based methods for string data. However, it is impossible to evaluate, applying traditional probability theory that has been constructed on spaces such as the Euclidean space R p and Hilbert space L 2 , the performance of the methods based on the string kernels, considering that a sample of observed strings is a part of a population generated according to a probability law.…”
Section: Discussionmentioning
confidence: 99%
“…Many previous studies proved the kernel-based methods for string representations to be useful in practical string data analysis. See, for example, [26,[43][44][45][46][47][48][49][50][51][52] for important recent studies on the kernel-based methods for string data. However, it is impossible to evaluate, applying traditional probability theory that has been constructed on spaces such as the Euclidean space R p and Hilbert space L 2 , the performance of the methods based on the string kernels, considering that a sample of observed strings is a part of a population generated according to a probability law.…”
Section: Discussionmentioning
confidence: 99%
“…The third baseline (BLsignature) generates compact signatures for each music track using a 15-dimensional MFCC feature set and compares these using bipartite graph matching [10]. The fourth baseline (BLmultivar) uses multivariate kernels [11] with the direct uniform quantization of the 13-dimensional MFCC features. The results for the latter three are taken from their publications, while the results for BLGMM baseline are reproduced using the implementation provided with the dataset.…”
Section: Baseline Methodsmentioning
confidence: 99%
“…For the 13-dimensional MFCCs, we use MIRTOOLBOX [29] with 40 frequency bands, 25 ms window length and 50% overlap to extract features from 32kbps mp3 files provided in the dataset. This is because another baseline method [11] also used it to extract 13-MFCCs. We use 2000 randomly-selected frames from the middle area of each song to compute Baum-Welch statistics, assuming that this middle area of the song contains the most singing voice data.…”
Section: Resourcesmentioning
confidence: 99%
See 1 more Smart Citation
“…Typically, the kernel value is the inner-product between two feature vectors corresponding to the two graphs. This so-called kernel trick has been used successfully to evaluate pairwise of other structures such as images and sequences [4,10,12]. Several graph kernels based on sub-structural patterns have been proposed, such as the Shortest-Path [5] and Graphlet [17] kernels.…”
Section: Related Workmentioning
confidence: 99%