Maryam AlJame scite author profile

Background and objectives The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. Method The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. Results The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5,644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6 - 100], AUC of 99.38% [95% CI: 97.5 - 100], a sensitivity of 98.72% [95% CI: 94.6 - 100] and a specificity of 99.99% [95% CI: 99.99- 100]. Discussion The proposed model revealed better performance when compared against existing state-of-the-art studies [ 3 , 22 , 56 , 71 ] for the same set of features employed by them. As compared to the best performing Bayes Net model [ 22 ] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model [ 56 ], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model [ 71 ] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained considerable higher score as compared with ANN model [ 3 ] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.

show abstract

Deep forest model for diagnosing COVID-19 from routine blood tests

AlJame

Imtiaz

Ahmad

et al. 2021

Sci Rep

View full text Add to dashboard Cite

The Coronavirus Disease 2019 (COVID-19) global pandemic has threatened the lives of people worldwide and posed considerable challenges. Early and accurate screening of infected people is vital for combating the disease. To help with the limited quantity of swab tests, we propose a machine learning prediction model to accurately diagnose COVID-19 from clinical and/or routine laboratory data. The model exploits a new ensemble-based method called the deep forest (DF), where multiple classifiers in multiple layers are used to encourage diversity and improve performance. The cascade level employs the layer-by-layer processing and is constructed from three different classifiers: extra trees, XGBoost, and LightGBM. The prediction model was trained and evaluated on two publicly available datasets. Experimental results show that the proposed DF model has an accuracy of 99.5%, sensitivity of 95.28%, and specificity of 99.96%. These performance metrics are comparable to other well-established machine learning techniques, and hence DF model can serve as a fast screening tool for COVID-19 patients at places where testing is scarce.

show abstract

Apache Spark Implementation of Whale Optimization Algorithm

2020

View full text Add to dashboard Cite

Parallel and Distributed Implementation of Sine Cosine Algorithm on Apache Spark Platform

2021

View full text Add to dashboard Cite

The Sine Cosine Algorithm (SCA) has experienced wide spread use in solving optimization problems in many disciplines mainly due to its simplicity and efficiency. However, like many other metaheuristics, SCA requires considerable amount of compute time when solving large size optimization problems. Therefore, in order to tackle such challenging problems efficiently, this work proposes Spark-SCA, a scalable and parallel implementation of SCA algorithm on Apache Spark distributed framework. Spark-SCA exploits Spark platform native support for iterative algorithms through in-memory computing to speed-up computations when handling large optimization problems. Both the design and implementation details of Spark-SCA are presented herein. The performance of Spark-SCA was compared to standard SCA on different benchmark functions with up to 1,000-dimension as well as three practical engineering design problems. Simulation experiments conducted on Amazon Web Services (AWS) public cloud demonstrated how Spark-SCA outperforms the standard version in terms of solution quality and run time as well as it competitiveness in exploring solution space of complex optimization problems.

show abstract

DNA short read alignment on apache spark

AlJame

Ahmad

2020

ACI

View full text Add to dashboard Cite

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Maryam AlJame

Ensemble learning model for diagnosing COVID-19 from routine blood tests

Deep forest model for diagnosing COVID-19 from routine blood tests

Apache Spark Implementation of Whale Optimization Algorithm

Parallel and Distributed Implementation of Sine Cosine Algorithm on Apache Spark Platform

DNA short read alignment on apache spark

Contact Info

Product

Resources

About