Background and objectives The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. Method The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. Results The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5,644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6 - 100], AUC of 99.38% [95% CI: 97.5 - 100], a sensitivity of 98.72% [95% CI: 94.6 - 100] and a specificity of 99.99% [95% CI: 99.99- 100]. Discussion The proposed model revealed better performance when compared against existing state-of-the-art studies [ 3 , 22 , 56 , 71 ] for the same set of features employed by them. As compared to the best performing Bayes Net model [ 22 ] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model [ 56 ], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model [ 71 ] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained considerable higher score as compared with ANN model [ 3 ] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.
The Coronavirus Disease 2019 (COVID-19) global pandemic has threatened the lives of people worldwide and posed considerable challenges. Early and accurate screening of infected people is vital for combating the disease. To help with the limited quantity of swab tests, we propose a machine learning prediction model to accurately diagnose COVID-19 from clinical and/or routine laboratory data. The model exploits a new ensemble-based method called the deep forest (DF), where multiple classifiers in multiple layers are used to encourage diversity and improve performance. The cascade level employs the layer-by-layer processing and is constructed from three different classifiers: extra trees, XGBoost, and LightGBM. The prediction model was trained and evaluated on two publicly available datasets. Experimental results show that the proposed DF model has an accuracy of 99.5%, sensitivity of 95.28%, and specificity of 99.96%. These performance metrics are comparable to other well-established machine learning techniques, and hence DF model can serve as a fast screening tool for COVID-19 patients at places where testing is scarce.
The Sine Cosine Algorithm (SCA) has experienced wide spread use in solving optimization problems in many disciplines mainly due to its simplicity and efficiency. However, like many other metaheuristics, SCA requires considerable amount of compute time when solving large size optimization problems. Therefore, in order to tackle such challenging problems efficiently, this work proposes Spark-SCA, a scalable and parallel implementation of SCA algorithm on Apache Spark distributed framework. Spark-SCA exploits Spark platform native support for iterative algorithms through in-memory computing to speed-up computations when handling large optimization problems. Both the design and implementation details of Spark-SCA are presented herein. The performance of Spark-SCA was compared to standard SCA on different benchmark functions with up to 1,000-dimension as well as three practical engineering design problems. Simulation experiments conducted on Amazon Web Services (AWS) public cloud demonstrated how Spark-SCA outperforms the standard version in terms of solution quality and run time as well as it competitiveness in exploring solution space of complex optimization problems.
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.