The time series classification literature has expanded rapidly over the last decade, with many new classification approaches published each year. Prior research has mostly focused on improving the accuracy and efficiency of classifiers, with interpretability being somewhat neglected. This aspect of classifiers has become critical for many application domains and the introduction of the EU GDPR legislation in 2018 is likely to further emphasize the importance of interpretable learning algorithms. Currently, state-of-the-art classification accuracy is achieved with very complex models based on large ensembles (COTE) or deep neural networks (FCN). These approaches are not efficient with regard to either time or space, are difficult to interpret and cannot be applied to variable-length time series, requiring pre-processing of the original series to a set fixedlength. In this paper we propose new time series classification algorithms to address these gaps. Our approach is based on symbolic representations of time series, efficient sequence mining algorithms and linear classification models. Our linear models are as accurate as deep learning models but are more efficient regarding running time and memory, can work with variable-length time series and can be interpreted by highlighting the discriminative symbolic features on the original time series. We advance the state-of-the-art in time series classification by proposing new algorithms built using the following three key ideas: (1) Multiple resolutions of symbolic representations: we combine symbolic representations obtained using different parameters, rather than one fixed representation (e.g., multiple SAX representations); (2) Multiple domain representations: we combine symbolic representations in time (e.g., SAX) and frequency (e.g., SFA) domains, to be more robust across problem types; (3) Efficient navigation in a huge symbolic-words space: we extend a symbolic sequence classifier (SEQL) to work with multiple symbolic representations and use its greedy feature selection strategy to effectively filter the best features for each representation. We show that our multi-resolution multi-domain linear classifier (mtSS-SEQL+LR) achieves a similar accuracy to the state-of-the-art COTE ensemble, and to recent deep learning methods (FCN, ResNet), but uses a fraction of the time and memory required by either COTE or deep models. To further analyse the interpretability of our classifier, we present a case study on a human motion dataset collected by the authors. We discuss the accuracy, efficiency and interpretability of our proposed algorithms and release all the results, source code and data to encourage reproducibility.
Abstract-Existing approaches to time series classification can be grouped into shape-based (numeric) and structure-based (symbolic). Shape-based techniques use the raw numeric time series with Euclidean or Dynamic Time Warping distance and a 1-Nearest Neighbor classifier. They are accurate, but computationally intensive. Structure-based methods discretize the raw data into symbolic representations, then extract features for classifiers. Recent symbolic methods have outperformed numeric ones regarding both accuracy and efficiency. Most approaches employ a bag-of-symbolic-words representation, but typically the word-length is fixed across all time series, an issue identified as a major weakness in the literature. Also, there are no prior attempts to use efficient sequence learning techniques to go beyond single words, to features based on variable-length sequences of words or symbols. We study an efficient linear classification approach, SEQL, originally designed for classification of symbolic sequences. SEQL learns discriminative subsequences from training data by exploiting the all-subsequence space using greedy gradient descent. We explore different discretization approaches, from none at all to increasing smoothing of the original data, and study the effect of these transformations on the accuracy of SEQL classifiers. We propose two adaptations of SEQL for time series data, SAX-VSEQL, can deal with X-axis offsets by learning variable-length symbolic words, and SAX-VFSEQL, can deal with X-axis and Y-axis offsets, by learning fuzzy variable-length symbolic words. Our models are linear classifiers in rich feature spaces. Their predictions are based on the most discriminative subsequences learned during training, and can be investigated for interpreting the classification decision.
Introduction: COVID-19 vaccines significantly reduce SARS-CoV-2 (SCoV2)-related hospitalization and mortality in randomized controlled clinical trials, as well as in real-world effectiveness against different circulating SCoV2-lineages. However, some vaccine recipients show breakthrough infection and it remains unknown, which host and viral factors contribute to this risk and how many resulted in severe outcomes. Our aim was to identify demographic and clinical risk factors for SCoV2 breakthrough infections and severe disease in fully vaccinated individuals and to compare patient characteristics in breakthrough infections caused by SCoV2 Alpha or Delta variant. Methods: We conducted an exploratory retrospective case-control study from 28th of December to 25th of October 2021 dominated by the Delta SCoV2 variant. All cases of infection had to be reported by law to the local health authorities. Vaccine recipients data was anonymously available from the national Vaccination Monitoring Data Lake and the main local vaccine center. We compared anonymized patients characteristics of breakthrough infection (n=492) to two overlapping control groups including all vaccine recipients from the Canton of Basel-City (group 1 n=126586 and group 2 n=109382). We also compared patients with breakthrough infection caused by the Alpha to Delta variant. We used different multivariate generalized linear models (GLM). Results: We found only 492/126586 (0.39%) vaccine recipients with a breakthrough infection after vaccination during the 10 months observational period. Most cases were asymptomatic or mild (478/492 97.2%) and only very few required hospitalization (14/492, 2.8%). The time to a positive SCoV2 test shows that most breakthrough infections occurred between a few days to about 170 days after full vaccination, with a median of 78 days (interquartile range, IQR 47-124 days). Factors associated with a lower odds for breakthrough infection were: age (OR 0.987, 95%CI 0.983-0.992), previous COVID-19 infection prior to vaccination (OR 0.296, 95%CI 0.117-0.606), and (self-declared) serious side-effects from previous vaccines (OR 0.289, 95%CI 0.033-1.035). Factors associated with a higher odds for breakthrough infection were: vaccination with the Pfizer/BioNTech vaccine (OR 1.459, 95%CI 1.238-1.612), chronic disease as vaccine indication (OR 2.109, 95%CI 1.692-2.620), and healthcare workers (OR 1.404, 95%CI 1.042-1.860). We did not observe a significantly increased risk for immunosuppressed patients (OR 1.248, 95% CI 0.806-1.849). Conclusions: Our study shows that breakthrough infections are rare and show mild illness, but that it occurs early after vaccination with more than 50% of cases within 70 to 80 days post-full vaccination. This clearly implies that boost vaccination should be much earlier initiated compared to the currently communicated 180-day threshold. This has important implications especially for risk groups associated with more frequent breakthrough infections such as healthcare workers, and people in high-risk care facilities. Due to changes in the epidemiological dynamic with new variants emerging, continuous monitoring of breakthrough infections is helpful to provide evidence on booster vaccines and patient groups at risk for potential complications.
We present a new approach for learning a sequence regression function, i.e., a mapping from sequential observations to a numeric score. Our learning algorithm employs coordinate gradient descent and Gauss-Southwell optimization in the feature space of all subsequences. We give a tight upper bound for the coordinate wise gradients of squared error loss that enables efficient Gauss-Southwell selection. The proposed bound is built by separating the positive and the negative gradients of the loss function and exploits the structure of the feature space. Extensive experiments on simulated as well as real-world sequence regression benchmarks show that the bound is effective and our proposed learning algorithm is efficient and accurate. The resulting linear regression model provides the user with a list of the most predictive features selected during the learning stage, adding to the interpretability of the method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.