Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space

Gsponer, Severin; Smyth, Barry; Ifrim, Georgiana

doi:10.1007/978-3-319-71246-8_3

Cited by 3 publications

(3 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gsponer et al used a different approach, where a linear model is trained on a set of features extracted from sequences of the dataset [20]. Those features correspond to the search space of all possible subsequences.…”

Section: Heuristic Methodsmentioning

confidence: 99%

Anytime mining of sequential discriminative patterns in labeled sequences

et al. 2020

View full text Add to dashboard Cite

Section: Heuristic Methodsmentioning

confidence: 99%

Anytime mining of sequential discriminative patterns in labeled sequences

et al. 2020

View full text Add to dashboard Cite

“…In this work we adopt SEQL a linear sequence classifier algorithm, as we want to learn a model that is interpretable but still achieves high accuracy. The main idea behind SEQL is to use a greedy coordinate gradient descent with the Gauss-Southwell rule [13] which allows to avoid the explicit generation of the feature vectors [5]. A key step of this approach is the efficient search for the current best k-mer, in the sense of maximum absolute gradient value, followed by an update of the corresponding weight value β.…”

Section: Seqlmentioning

confidence: 99%

Background Knowledge Injection for Interpretable Sequence Classification

Gsponer,

Costabello,

Van

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Sequence classification is the supervised learning task of building models that predict class labels of unseen sequences of symbols. Although accuracy is paramount, in certain scenarios interpretability is a must. Unfortunately, such trade-off is often hard to achieve since we lack human-independent interpretability metrics. We introduce a novel sequence learning algorithm, that combines (i) linear classifiers -which are known to strike a good balance between predictive power and interpretability, and (ii) background knowledge embeddings. We extend the classic subsequence feature space with groups of symbols which are generated by background knowledge injected via word or graph embeddings, and use this new feature space to learn a linear classifier. We also present a new measure to evaluate the interpretability of a set of symbolic features based on the symbol embeddings. Experiments on human activity recognition from wearables and amino acid sequence classification show that our classification approach preserves predictive power, while delivering more interpretable models.

show abstract

“…The main problem of using the k-mer frequency feature is dealing with a largedimension feature space when aiming to obtain high accuracy [14]. To solve this problem, Kusuma [15] introduced spaced k-mers, inspired by PatternHunter [16], to reduce the feature space dimension and improving accuracy.…”

Section: Introductionmentioning

confidence: 99%

Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

Pekuwali¹,

Kusuma

Buono

2018

J. ICT Res. Appl.

View full text Add to dashboard Cite

K-mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k-mers. These were obtained by generating the possible combinations of match positions and don't care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k-mers could reduce the size of the k-mer frequency feature's dimension. To measure the accuracy of the proposed method we used the naïve Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k-mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k-mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.

show abstract

Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space

Cited by 3 publications

References 15 publications

Anytime mining of sequential discriminative patterns in labeled sequences

Anytime mining of sequential discriminative patterns in labeled sequences

Background Knowledge Injection for Interpretable Sequence Classification

Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

Contact Info

Product

Resources

About