Data imbalance is frequently encountered in biomedical applications. Resampling techniques can be used in binary classification to tackle this issue. However such solutions are not desired when the number of samples in the small class is limited. Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class. One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance. Matthews Correlation Coefficient (MCC) is widely used in Bioinformatics as a performance metric. We are interested in developing a new classifier based on the MCC metric to handle imbalanced data. We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet derivative. We show that the proposed algorithm has the nice theoretical property of consistency. Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers. The proposed classifier is evaluated on 64 datasets from a wide range data imbalance. We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba). The experimental evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient.
Table of contents O1 Regulation of genes by telomere length over long distances Jerry W. Shay O2 The microtubule destabilizer KIF2A regulates the postnatal establishment of neuronal circuits in addition to prenatal cell survival, cell migration, and axon elongation, and its loss leading to malformation of cortical development and severe epilepsy Noriko Homma, Ruyun Zhou, Muhammad Imran Naseer, Adeel G. Chaudhary, Mohammed Al-Qahtani, Nobutaka Hirokawa O3 Integration of metagenomics and metabolomics in gut microbiome research Maryam Goudarzi, Albert J. Fornace Jr. O4 A unique integrated system to discern pathogenesis of central nervous system tumors Saleh Baeesa, Deema Hussain, Mohammed Bangash, Fahad Alghamdi, Hans-Juergen Schulten, Angel Carracedo, Ishaq Khan, Hanadi Qashqari, Nawal Madkhali, Mohamad Saka, Kulvinder S. Saini, Awatif Jamal, Jaudah Al-Maghrabi, Adel Abuzenadah, Adeel Chaudhary, Mohammed Al Qahtani, Ghazi Damanhouri O5 RPL27A is a target of miR-595 and deficiency contributes to ribosomal dysgenesis Heba Alkhatabi O6 Next generation DNA sequencing panels for haemostatic and platelet disorders and for Fanconi anaemia in routine diagnostic service Anne Goodeve, Laura Crookes, Nikolas Niksic, Nicholas Beauchamp O7 Targeted sequencing panels and their utilization in personalized medicine Adel M. Abuzenadah O8 International biobanking in the era of precision medicine Jim Vaught O9 Biobank and biodata for clinical and forensic applications Bruce Budowle, Mourad Assidi, Abdelbaset Buhmeida O10 Tissue microarray technique: a powerful adjunct tool for molecular profiling of solid tumors Jaudah Al-Maghrabi O11 The CEGMR biobanking unit: achievements, challenges and future plans Abdelbaset Buhmeida, Mourad Assidi, Leena Merdad O12 Phylomedicine of tumors Sudhir Kumar, Sayaka Miura, Karen Gomez O13 Clinical implementation of pharmacogenomics for colorectal cancer treatment Angel Carracedo, Mahmood Rasool O14 From association to causality: translation of GWAS findings for genomic medicine Ahmed Rebai O15 E-GRASP: an interactive database and web application for efficient analysis of disease-associated genetic information Sajjad Karim, Hend F Nour Eldin, Heba Abusamra, Elham M Alhathli, Nada Salem, Mohammed H Al-Qahtani, Sudhir Kumar O16 The supercomputer facility “AZIZ” at KAU: utility and future prospects Hossam Faheem O17 New research into the causes of male infertility Ashok Agarwa O18 The Klinefelter syndrome: recent progress in pathophysiology and management Eberhard Nieschlag, Joachim Wistuba, Oliver S. Damm, Mohd A. Beg, Taha A. Abdel-Meguid, Hisham A. Mosli, Osama S. Bajouh, Adel M. Abuzenadah, Mohammed H. Al-Q...
Vitamin D inadequacy appears to be on the rise globally, and it has been linked to an increased risk of osteoporosis, as well as metabolic, cardiovascular, and autoimmune diseases. Vitamin D concentrations are partially determined by genetic factors. Specific single nucleotide polymorphisms (SNPs) in genes involved in vitamin D transport, metabolism, or binding have been found to be associated with its serum concentration, and these SNPs differ among ethnicities. Vitamin D has also been suggested to be a regulator of the gut microbiota and vitamin D deficiency as the possible cause of gut microbial dysbiosis and inflammation. This pilot study aims to fill the gap in our understanding of the prevalence, cause, and implications of vitamin D inadequacy in a pediatric population residing in Qatar. Blood and fecal samples were collected from healthy subjects aged 4–14 years. Blood was used to measure serum metabolite of vitamin D, 25-hydroxycholecalciferol 25(OH)D. To evaluate the composition of the gut microbiota, fecal samples were subjected to 16S rRNA gene sequencing. High levels of vitamin D deficiency/insufficiency were observed in our cohort with 97% of the subjects falling into the inadequate category (with serum 25(OH)D < 75 nmol/L). The CT genotype in rs12512631, an SNP in the GC gene, was associated with low serum levels of vitamin D (ANOVA, p = 0.0356) and was abundant in deficient compared to non-deficient subjects. Overall gut microbial community structure was significantly different between the deficient (D) and non-deficient (ND) groups (Bray Curtis dissimilarity p = 0.049), with deficient subjects also displaying reduced gut microbial diversity. Significant differences were observed among the two major gut phyla, Firmicutes (F) and Bacteroidetes (B), where deficient subjects displayed a higher B/F ratio (p = 0.0097) compared to ND. Vitamin D deficient children also demonstrated gut enterotypes dominated by the genus Prevotella as opposed to Bacteroides. Our findings suggest that pediatric vitamin D inadequacy significantly impacts the gut microbiota. We also highlight the importance of considering host genetics and baseline gut microbiome composition in interpreting the clinical outcomes related to vitamin D deficiency as well as designing better personalized strategies for therapeutic interventions.
Background: Many studies have linked dysbiosis of the gut microbiome to the development of cardiovascular diseases (CVD). However, studies assessing the association between the salivary microbiome and CVD risk on a large cohort remain sparse. This study aims to identify whether a predictive salivary microbiome signature is associated with a high risk of developing CVD in the Qatari population.Methods: Saliva samples from 2,974 Qatar Genome Project (QGP) participants were collected from Qatar Biobank (QBB). Based on the CVD score, subjects were classified into low-risk (LR < 10) (n = 2491), moderate-risk (MR = 10–20) (n = 320) and high-risk (HR > 30) (n = 163). To assess the salivary microbiome (SM) composition, 16S-rDNA libraries were sequenced and analyzed using QIIME-pipeline. Machine Learning (ML) strategies were used to identify SM-based predictors of CVD risk.Results:Firmicutes and Bacteroidetes were the predominant phyla among all the subjects included. Linear Discriminant Analysis Effect Size (LEfSe) analysis revealed that Clostridiaceae and Capnocytophaga were the most significantly abundant genera in the LR group, while Lactobacillus and Rothia were significantly abundant in the HR group. ML based prediction models revealed that Desulfobulbus, Prevotella, and Tissierellaceae were the common predictors of increased risk to CVD.Conclusion: This study identified significant differences in the SM composition in HR and LR CVD subjects. This is the first study to apply ML-based prediction modeling using the SM to predict CVD in an Arab population. More studies are required to better understand the mechanisms of how those microbes contribute to CVD.
Background The genetic architecture underlying Familial Hypercholesterolemia (FH) in Middle Eastern Arabs is yet to be fully described, and approaches to assess this from population-wide biobanks are important for public health planning and personalized medicine. Methods We evaluate the pilot phase cohort (n = 6,140 adults) of the Qatar Biobank (QBB) for FH using the Dutch Lipid Clinic Network (DLCN) criteria, followed by an in-depth characterization of all genetic alleles in known dominant (LDLR, APOB, and PCSK9) and recessive (LDLRAP1, ABCG5, ABCG8, and LIPA) FH-causing genes derived from whole-genome sequencing (WGS). We also investigate the utility of a globally established 12-SNP polygenic risk score to predict FH individuals in this cohort with Arab ancestry. Results Using DLCN criteria, we identify eight (0.1%) ‘definite’, 41 (0.7%) ‘probable’ and 334 (5.4%) ‘possible’ FH individuals, estimating a prevalence of ‘definite or probable’ FH in the Qatari cohort of ~ 1:125. We identify ten previously known pathogenic single-nucleotide variants (SNVs) and 14 putatively novel SNVs, as well as one novel copy number variant in PCSK9. Further, despite the modest sample size, we identify one homozygote for a known pathogenic variant (ABCG8, p. Gly574Arg, global MAF = 4.49E-05) associated with Sitosterolemia 2. Finally, calculation of polygenic risk scores found that individuals with ‘definite or probable’ FH have a significantly higher LDL-C SNP score than ‘unlikely’ individuals (p = 0.0003), demonstrating its utility in Arab populations. Conclusion We design and implement a standardized approach to phenotyping a population biobank for FH risk followed by systematically identifying known variants and assessing putative novel variants contributing to FH burden in Qatar. Our results motivate similar studies in population-level biobanks – especially those with globally under-represented ancestries – and highlight the importance of genetic screening programs for early detection and management of individuals with high FH risk in health systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.