Due to the continued evolution of the
SARS-CoV-2
pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (
DSP
) and machine learning approaches. This study presents an alignment-free approach to classify the
SARS-CoV-2
using complementary
DNA
, which is
DNA
synthesized from the single-stranded
RNA
virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a
SARS-CoV-2
and a
non-SARS-CoV-2
group. We extracted eight biomarkers based on three-base periodicity, using
DSP
techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of
SARS-CoV-2
from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10x10 cross-validation paired
t
-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the
SARS-CoV-2
coronavirus from other coronaviruses and a control a group with an accuracy of 97.4%, sensitivity of 96.2%, and specificity of 98.2%, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 seconds to compute the genome biomarkers, outperforming previous studies.
BACKGROUND: The quantitative features of a capnogram signal are important clinical metrics in assessing pulmonary function. However, these features should be quantified from the regular (artefact-free) segments of the capnogram waveform. OBJECTIVE: This paper presents a machine learning-based approach for the automatic classification of regular and irregular capnogram segments. METHODS: Herein, we proposed four time- and two frequency-domain features experimented with the support vector machine classifier through ten-fold cross-validation. MATLAB simulation was conducted on 100 regular and 100 irregular 15 s capnogram segments. Analysis of variance was performed to investigate the significance of the proposed features. Pearson’s correlation was utilized to select the relatively most substantial ones, namely variance and the area under normalized magnitude spectrum. Classification performance, using these features, was evaluated against two feature sets in which either time- or frequency-domain features only were employed. RESULTS: Results showed a classification accuracy of 86.5%, which outperformed the other cases by an average of 5.5%. The achieved specificity, sensitivity, and precision were 84%, 89% and 86.51%, respectively. The average execution time for feature extraction and classification per segment is only 36 ms. CONCLUSION: The proposed approach can be integrated with capnography devices for real-time capnogram-based respiratory assessment. However, further research is recommended to enhance the classification performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.