Generalized Similarity Kernels for Efficient Sequence Classification

Kuksa, Pavel P.; Khan, Imdadullah; Pavlović, Vladimir

doi:10.1137/1.9781611972825.75

Cited by 26 publications

(24 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many previous studies proved the kernel-based methods for string representations to be useful in practical string data analysis. See, for example, [26,[43][44][45][46][47][48][49][50][51][52] for important recent studies on the kernel-based methods for string data. However, it is impossible to evaluate, applying traditional probability theory that has been constructed on spaces such as the Euclidean space R p and Hilbert space L 2 , the performance of the methods based on the string kernels, considering that a sample of observed strings is a part of a population generated according to a probability law.…”

Section: Discussionmentioning

confidence: 99%

Maximum margin classifier working in a set of strings

2016

View full text Add to dashboard Cite

Numbers and numerical vectors account for a large portion of data. However, recently, the amount of string data generated has increased dramatically. Consequently, classifying string data is a common problem in many fields. The most widely used approach to this problem is to convert strings into numerical vectors using string kernels and subsequently apply a support vector machine that works in a numerical vector space. However, this nonone-to-one conversion involves a loss of information and makes it impossible to evaluate, using probability theory, the generalization error of a learning machine, considering that the given data to train and test the machine are strings generated according to probability laws. In this study, we approach this classification problem by constructing a classifier that works in a set of strings. To evaluate the generalization error of such a classifier theoretically, probability theory for strings is required. Therefore, we first extend a limit theorem for a consensus sequence of strings demonstrated by one of the authors and co-workers in a previous study. Using the obtained result, we then demonstrate that our learning machine classifies strings in an asymptotically optimal manner. Furthermore, we demonstrate the usefulness of our machine in practical data analysis by applying it to predicting proteinprotein interactions using amino acid sequences and classifying RNAs by the secondary structure using nucleotide sequences.

show abstract

Section: Discussionmentioning

confidence: 99%

Maximum margin classifier working in a set of strings

2016

View full text Add to dashboard Cite

show abstract

“…The third baseline (BLsignature) generates compact signatures for each music track using a 15-dimensional MFCC feature set and compares these using bipartite graph matching [10]. The fourth baseline (BLmultivar) uses multivariate kernels [11] with the direct uniform quantization of the 13-dimensional MFCC features. The results for the latter three are taken from their publications, while the results for BLGMM baseline are reproduced using the implementation provided with the dataset.…”

Section: Baseline Methodsmentioning

confidence: 99%

“…For the 13-dimensional MFCCs, we use MIRTOOLBOX [29] with 40 frequency bands, 25 ms window length and 50% overlap to extract features from 32kbps mp3 files provided in the dataset. This is because another baseline method [11] also used it to extract 13-MFCCs. We use 2000 randomly-selected frames from the middle area of each song to compute Baum-Welch statistics, assuming that this middle area of the song contains the most singing voice data.…”

Section: Resourcesmentioning

confidence: 99%

See 1 more Smart Citation

Timbral modeling for music artist recognition using i-vectors

Eghbal-zadeh

Schedl

Widmer

2015

2015 23rd European Signal Processing Conference (EUSIPCO)

View full text Add to dashboard Cite

Music artist (i.e., singer) recognition is a challenging task in Music Information Retrieval (MIR). The presence of different musical instruments, the diversity of music genres and singing techniques make the retrieval of artist-relevant information from a song difficult. Many authors tried to address this problem by using complex features or hybrid systems. In this paper, we propose new song-level timbre-related features that are built from frame-level MFCCs via so-called i-vectors. We report artist recognition results with multiple classifiers such as K-nearest neighbor, Discriminant Analysis and Naive Bayes using these new features. Our approach yields considerable improvements and outperforms existing methods. We could achieve an 84.31% accuracy using MFCC features on a 20-classes artist recognition task.

show abstract

“…Typically, the kernel value is the inner-product between two feature vectors corresponding to the two graphs. This so-called kernel trick has been used successfully to evaluate pairwise of other structures such as images and sequences [4,10,12]. Several graph kernels based on sub-structural patterns have been proposed, such as the Shortest-Path [5] and Graphlet [17] kernels.…”

Section: Related Workmentioning

confidence: 99%

Estimating Descriptors for Large Graphs

Hassan

Shabbir

Khan

et al. 2020

Advances in Knowledge Discovery and Data Mining

Self Cite

View full text Add to dashboard Cite

Embedding networks into a fixed dimensional feature space, while preserving its essential structural properties is a fundamental task in graph analytics. These feature vectors (graph descriptors) are used to measure the pairwise similarity between graphs. This enables applying data mining algorithms (e.g classification, clustering, or anomaly detection) on graph-structured data which have numerous applications in multiple domains. State-of-the-art algorithms for computing descriptors require the entire graph to be in memory, entailing a huge memory footprint, and thus do not scale well to increasing sizes of real-world networks. In this work, we propose streaming algorithms to efficiently approximate descriptors by estimating counts of sub-graphs of order k ≤ 4, and thereby devise extensions of two existing graph comparison paradigms: the Graphlet Kernel and NetSimile. Our algorithms require a single scan over the edge stream, have space complexity that is a fraction of the input size, and approximate embeddings via a simple sampling scheme. Our design exploits the trade-off between available memory and estimation accuracy to provide a method that works well for limited memory requirements. We perform extensive experiments on real-world networks and demonstrate that our algorithms scale well to massive graphs.

show abstract

Generalized Similarity Kernels for Efficient Sequence Classification

Cited by 26 publications

References 24 publications

Maximum margin classifier working in a set of strings

Maximum margin classifier working in a set of strings

Timbral modeling for music artist recognition using i-vectors

Estimating Descriptors for Large Graphs

Contact Info

Product

Resources

About