The exponential rise in availability of clinical data, and especially physiological recordings made using wearables, creates a real need for highly accurate and fully automated analysis techniques. An automated detection of ventricular beat in the ECG is proposed, which is an extension of a recently published switching Kalman filter (skf) approach. The latter technique enables automatic selection of the most likely mode (beat type), and makes novelty detection possible by incorporating a mode for unknown morphologies (X-factor). The previously published technique is semi-supervised and relies on the manual annotation of the different clusters (or modes), thus making it less readily applicable in Big Data scenarios. Here we propose to extend the switching Kalman filter technique by automating the labelling of the modes. Each heartbeat in a mode was classified individually using a featurebased approach, and the cluster was assigned a given type by majority voting. Two different feature-based classifications were tested. First, ecgkit, a state-of-the-art toolkit recently made available online provides a heartbeat classification based on clustering and Linear Discriminant Analysis. Second, a Support Vector Machine (svm) approach was used with the same features as ecgkit. Therefore two different automated switching Kalman filter techniques were tested, ecgkit-skf and svm-skf, that differed only in the way the modes were classified. Both approaches were assessed on an independent subset of the MIT-BIH arrhythmia database (22 individual subjects, 30-minute recordings), and were compared to the semi-supervised switching Kalman filter approach (skf), as well as to the classification techniques, ecgkit and svm. F1 varied from 81.2% for ecgkit, 85.4% for svm, 91.8% for ecgkit-skf, 92.3% for svm-skf, and 98.6% for skf. The proposed combined techniques demonstrated improved automatic beat classification, compared to state-of-the-art fully automated techniques (ecgkit). Performances were however still lower than what was achieved with semi-supervised techniques (skf) highlighting the fact that some clusters were mislabeled.
IntroductionWith the exponential rise in the acquisition of physiological data, often for phenotyping purposes, there is an increased importance for the extraction of meaningful information from this vast quantity of data. Cardiac aplications are no exception, and it is widely accepted that big data analytics in cardiology will lead to improved patient outcomes in cardiovacular disease.Several applications will require the development of robust and fully automated data analysis techniques, among them:(i) telemedicine and in particular mHealth applications as the tool for reaching a wider population and predicting the advent of serious pathologies [1] or (ii) the prevention, diagnosis and treatment of a wide range of serious and life-threatening illnesses. These applications are supported by the large databases such as Physionet [2], and longitudinal studies such as the UK Biobank[3].When it comes t...