Background and objectives The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. Method The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. Results The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5,644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6 - 100], AUC of 99.38% [95% CI: 97.5 - 100], a sensitivity of 98.72% [95% CI: 94.6 - 100] and a specificity of 99.99% [95% CI: 99.99- 100]. Discussion The proposed model revealed better performance when compared against existing state-of-the-art studies [ 3 , 22 , 56 , 71 ] for the same set of features employed by them. As compared to the best performing Bayes Net model [ 22 ] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model [ 56 ], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model [ 71 ] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained considerable higher score as compared with ANN model [ 3 ] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.
The Coronavirus Disease 2019 (COVID-19) global pandemic has threatened the lives of people worldwide and posed considerable challenges. Early and accurate screening of infected people is vital for combating the disease. To help with the limited quantity of swab tests, we propose a machine learning prediction model to accurately diagnose COVID-19 from clinical and/or routine laboratory data. The model exploits a new ensemble-based method called the deep forest (DF), where multiple classifiers in multiple layers are used to encourage diversity and improve performance. The cascade level employs the layer-by-layer processing and is constructed from three different classifiers: extra trees, XGBoost, and LightGBM. The prediction model was trained and evaluated on two publicly available datasets. Experimental results show that the proposed DF model has an accuracy of 99.5%, sensitivity of 95.28%, and specificity of 99.96%. These performance metrics are comparable to other well-established machine learning techniques, and hence DF model can serve as a fast screening tool for COVID-19 patients at places where testing is scarce.
The Sine Cosine Algorithm (SCA) has experienced wide spread use in solving optimization problems in many disciplines mainly due to its simplicity and efficiency. However, like many other metaheuristics, SCA requires considerable amount of compute time when solving large size optimization problems. Therefore, in order to tackle such challenging problems efficiently, this work proposes Spark-SCA, a scalable and parallel implementation of SCA algorithm on Apache Spark distributed framework. Spark-SCA exploits Spark platform native support for iterative algorithms through in-memory computing to speed-up computations when handling large optimization problems. Both the design and implementation details of Spark-SCA are presented herein. The performance of Spark-SCA was compared to standard SCA on different benchmark functions with up to 1,000-dimension as well as three practical engineering design problems. Simulation experiments conducted on Amazon Web Services (AWS) public cloud demonstrated how Spark-SCA outperforms the standard version in terms of solution quality and run time as well as it competitiveness in exploring solution space of complex optimization problems.
The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.
The Coronavirus Disease 2019 (COVID-19) global pandemic has threatened the lives of people worldwide and poses considerable challenges. Early and accurate screening of infected people is vital for combating the disease. To help with the limited quantity of swab tests, we propose a machine learning prediction model to accurately diagnose COVID-19 from clinical and/or routine laboratory data. The model exploits a new ensemble-based method called the deep forest (DF), where multiple classifiers in multiple layers are used to encourage diversity and improve performance. The cascade level employs the layer-by-layer processing and is constructed from three different classifiers: extra trees, XGBoost, and LightGBM. The prediction model was trained and evaluated on two publicly available datasets. Experimental results show that the proposed DF model has an accuracy of 99.5%, sensitivity of 95.28%, and specificity of 99.96%. These performance metrics are comparable to other well-established machine learning techniques, and hence DF model can serve as a fast screening tool for COVID-19 patients at places where testing is scarce.
Arithmetic optimization algorithm (AOA) is a recent population-based metaheuristic widely used for solving optimization problems. However, the emerging large-scale optimization problems pose a great challenge for AOA due to its prohibitive computational cost to traverse the huge solution space effectively. This article proposes a parallel Spark-AOA using Scala on Apache Spark computing platform. Spark-AOA leverages the intrinsic parallel nature of the population-based AOA and the native iterative in-memory computation support of Spark through resilient distributed datasets (RDD) to accelerate the optimization process. Spark-AOA divides the solutions population into several subpopulations that are distributed into multiple RDD partitions and manipulated concurrently. Simulation experiments on different benchmark functions with up to 1,000-dimension and three engineering design problems demonstrate that Spark-AOA outperforms considerably standard AOA and Spark-based implementations of two recent metaheuristics both in terms of run-time and solution quality.
The novel coronavirus (COVID-19) was announced as a global pandemic by the World Health Organization (WHO) on March 11, 2020. In response, the State of Kuwait applied a series of three lockdown measures in 2020. Previous research highlighted the positive impact of lockdown measures on environmental health and safety by reducing air pollution levels. While this prior work demonstrated the effectiveness of lockdown measures on reducing pollution levels in different geographical locations, there is limited evidence that shows whether the lockdowns implemented in Kuwait were effective in terms of reducing air pollution. Thus, the main goal of this study was to investigate the impact of the COVID-19 lockdown measures taken in Kuwait on the concentrations of the following pollutants: Particulate Matter 2.5 (PM2.5), Particulate Matter 10 (PM10), Nitrogen Dioxide (NO2), and Ozone (O3). Data from two different air monitoring stations (Aljahra and Alahmadi) was used to compare pollution levels from three lockdown intervals – two partial lockdowns and one total lockdown. A sequential approach was utilized in the current study where air quality data during the three lockdown periods was compared with air quality data during the pre-lockdown period. The main findings indicated that NO2 concentrations decreased by 48%, 63%, and 48% after the first partial, total lockdown, and second partial lockdowns, respectively in Aljahra station. Meanwhile, Ozone concentrations increased by 30-100% across all lockdown periods for both stations. Finally, PM10 and PM2.5 concentrations did not decrease after the total lockdown. This research urges public policy experts to consider immediate measures to mitigate the environmental, health, and safety risks posed by air pollution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.