Bryan A. Dawkins scite author profile

Bryan A. Dawkins

5Publications

78Citation Statements Received

147Citation Statements Given

How they've been cited

How they cite others

155

144

Affiliations

Oklahoma Medical Research Foundation, University of Tulsa, University of Central Oklahoma

Publications

Order By: Most citations

Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding

Dawkins²,

McKinney

2020

View full text Add to dashboard Cite

Summary Machine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL and structural/functional neuroimage studies for case–control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR’s utility for GWAS and continuous outcomes. Availability and implementation Available at: https://insilico.github.io/npdr/. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS

et al. 2018

View full text Add to dashboard Cite

BackgroundReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding.MethodsWe introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity.ResultsWe apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias.ConclusionsOur results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment.Electronic supplementary materialThe online version of this article (10.1186/s13040-018-0186-4) contains supplementary material, which is available to authorized users.

show abstract

Theoretical properties of distance distributions and novel metrics for nearest-neighbor feature selection

Dawkins

McKinney

2021

PLoS ONE

View full text Add to dashboard Cite

The performance of nearest-neighbor feature selection and prediction methods depends on the metric for computing neighborhoods and the distribution properties of the underlying data. Recent work to improve nearest-neighbor feature selection algorithms has focused on new neighborhood estimation methods and distance metrics. However, little attention has been given to the distributional properties of pairwise distances as a function of the metric or data type. Thus, we derive general analytical expressions for the mean and variance of pairwise distances for Lq metrics for normal and uniform random data with p attributes and m instances. The distribution moment formulas and detailed derivations provide a resource for understanding the distance properties for metrics and data types commonly used with nearest-neighbor methods, and the derivations provide the starting point for the following novel results. We use extreme value theory to derive the mean and variance for metrics that are normalized by the range of each attribute (difference of max and min). We derive analytical formulas for a new metric for genetic variants, which are categorical variables that occur in genome-wide association studies (GWAS). The genetic distance distributions account for minor allele frequency and the transition/transversion ratio. We introduce a new metric for resting-state functional MRI data (rs-fMRI) and derive its distance distribution properties. This metric is applicable to correlation-based predictors derived from time-series data. The analytical means and variances are in strong agreement with simulation results. We also use simulations to explore the sensitivity of the expected means and variances in the presence of correlation and interactions in the data. These analytical results and new metrics can be used to inform the optimization of nearest neighbor methods for a broad range of studies, including gene expression, GWAS, and fMRI data.

show abstract

Novel HLA associations with outcomes of Mycobacterium tuberculosis exposure and sarcoidosis in individuals of African ancestry using nearest‐neighbor feature selection

Dawkins

Garman

Cejda

et al. 2022

Genetic Epidemiology

View full text Add to dashboard Cite

Tuberculosis and sarcoidosis are inflammatory diseases characterized by granulomas that may occur in any organ but are often found in the lung. The panoply of classical human leukocyte antigen (HLA) alleles associated with occurrence and/or severity of both diseases varies considerably across studies. This heterogeneity of results, due to variation in factors like ancestry and disease subphenotype, as well as the use of simple modeling strategies to elucidate likely complex relationships, has made conclusions about underlying commonalities difficult. Here we perform HLA association analyses in individuals of African ancestry, using a greater resolution to include subphenotypes of disease and employing more comprehensive analytical techniques. Using a novel application of nearest‐neighbor feature selection to score allelic importance, we investigated HLA allele association with Mycobacterium tuberculosis exposure outcomes in the first analysis of both latent Mycobacterium tuberculosis infection and active disease compared with those who, despite long‐term exposure to active index cases, have neither positive diagnostic tests nor display clinical symptoms. We also compared persistent to resolved sarcoidosis. This led to the identification of novel HLA associations and evidence of main effects and interaction effects. We found strikingly similar main effects and interaction effects at HLA‐DRB1, ‐DQB1, and ‐DPB1 in those resistant to tuberculosis (either latent or active) and persistent sarcoidosis.

show abstract

Inclusivity in Research Matters: Variants in PVT1 Specific to Persons of African Descent Are Associated with Pulmonary Fibrosis

Garman

Pezant

Dawkins

et al. 2024

Am J Respir Crit Care Med

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bryan A. Dawkins

Nearest-neighbor Projected-Distance Regression (NPDR) for detecting network interactions with adjustments for multiple tests and confounding

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS

Theoretical properties of distance distributions and novel metrics for nearest-neighbor feature selection

Novel HLA associations with outcomes of Mycobacterium tuberculosis exposure and sarcoidosis in individuals of African ancestry using nearest‐neighbor feature selection

Inclusivity in Research Matters: Variants in PVT1 Specific to Persons of African Descent Are Associated with Pulmonary Fibrosis

Contact Info

Product

Resources

About