Bethany Percha scite author profile

Background COVID-19 has infected millions of people worldwide and is responsible for several hundred thousand fatalities. The COVID-19 pandemic has necessitated thoughtful resource allocation and early identification of high-risk patients. However, effective methods to meet these needs are lacking. Objective The aims of this study were to analyze the electronic health records (EHRs) of patients who tested positive for COVID-19 and were admitted to hospitals in the Mount Sinai Health System in New York City; to develop machine learning models for making predictions about the hospital course of the patients over clinically meaningful time horizons based on patient characteristics at admission; and to assess the performance of these models at multiple hospitals and time points. Methods We used Extreme Gradient Boosting (XGBoost) and baseline comparator models to predict in-hospital mortality and critical events at time windows of 3, 5, 7, and 10 days from admission. Our study population included harmonized EHR data from five hospitals in New York City for 4098 COVID-19–positive patients admitted from March 15 to May 22, 2020. The models were first trained on patients from a single hospital (n=1514) before or on May 1, externally validated on patients from four other hospitals (n=2201) before or on May 1, and prospectively validated on all patients after May 1 (n=383). Finally, we established model interpretability to identify and rank variables that drive model predictions. Results Upon cross-validation, the XGBoost classifier outperformed baseline models, with an area under the receiver operating characteristic curve (AUC-ROC) for mortality of 0.89 at 3 days, 0.85 at 5 and 7 days, and 0.84 at 10 days. XGBoost also performed well for critical event prediction, with an AUC-ROC of 0.80 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. In external validation, XGBoost achieved an AUC-ROC of 0.88 at 3 days, 0.86 at 5 days, 0.86 at 7 days, and 0.84 at 10 days for mortality prediction. Similarly, the unimputed XGBoost model achieved an AUC-ROC of 0.78 at 3 days, 0.79 at 5 days, 0.80 at 7 days, and 0.81 at 10 days. Trends in performance on prospective validation sets were similar. At 7 days, acute kidney injury on admission, elevated LDH, tachypnea, and hyperglycemia were the strongest drivers of critical event prediction, while higher age, anion gap, and C-reactive protein were the strongest drivers of mortality prediction. Conclusions We externally and prospectively trained and validated machine learning models for mortality and critical events for patients with COVID-19 at different time horizons. These models identified at-risk patients and uncovered underlying relationships that predicted outcomes.

show abstract

Informatics confronts drug–drug interactions

Percha

Altman

2013

Trends in Pharmacological Sciences

162

136

View full text Add to dashboard Cite

Drug–drug interactions (DDIs) are an emerging threat to public health. Recent estimates indicate that DDIs cause nearly 74 000 emergency room visits and 195 000 hospitalizations each year in the USA. Current approaches to DDI discovery, which include Phase IV clinical trials and post-marketing surveillance, are insufficient for detecting many DDIs and do not alert the public to potentially dangerous DDIs before a drug enters the market. Recent work has applied state-of-the-art computational and statistical methods to the problem of DDIs. Here we review recent developments that encompass a range of informatics approaches in this domain, from the construction of databases for efficient searching of known DDIs to the prediction of novel DDIs based on data from electronic medical records, adverse event reports, scientific abstracts, and other sources. We also explore why DDIs are so difficult to detect and what the future holds for informatics-based approaches to DDI discovery.

show abstract

Deep learning predicts hip fracture using confounding patient and healthcare variables

et al. 2019

View full text Add to dashboard Cite

Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 patient traits, and 14 hospital process variables. All 20 variables could be individually predicted from a radiograph, with the best performances on scanner model (AUC = 1.00), scanner brand (AUC = 0.98), and whether the order was marked “priority” (AUC = 0.79). Fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86, DeLong paired AUC comparison, p = 2e-9) or patient data plus hospital process features (AUC = 0.91, p = 1e-21). Fracture prediction on a test set that balanced fracture risk across patient variables was significantly lower than a random test set (AUC = 0.67, DeLong unpaired AUC comparison, p = 0.003); and on a test set with fracture risk balanced across patient and hospital process variables, the model performed randomly (AUC = 0.52, 95% CI 0.46–0.58), indicating that these variables were the main source of the model’s fracture predictions. A single model that directly combines image features, patient, and hospital process data outperforms a Naive Bayes ensemble of an image-only model prediction, patient, and hospital process data. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep-learning decision processes so that computers and clinicians can effectively cooperate.

show abstract

Discovery and Explanation of Drug-Drug Interactions via Text Mining

Percha

Garten

Altman

2011

View full text Add to dashboard Cite

Drug-drug interactions (DDIs) can occur when two drugs interact with the same gene product. Most available information about gene-drug relationships is contained within the scientific literature, but is dispersed over a large number of publications, with thousands of new publications added each month. In this setting, automated text mining is an attractive solution for identifying gene-drug relationships and aggregating them to predict novel DDIs. In previous work, we have shown that gene-drug interactions can be extracted from Medline abstracts with high fidelity - we extract not only the genes and drugs, but also the type of relationship expressed in individual sentences (e.g. metabolize, inhibit, activate and many others). We normalize these relationships and map them to a standardized ontology. In this work, we hypothesize that we can combine these normalized gene-drug relationships, drawn from a very broad and diverse literature, to infer DDIs. Using a training set of established DDIs, we have trained a random forest classifier to score potential DDIs based on the features of the normalized assertions extracted from the literature that relate two drugs to a gene product. The classifier recognizes the combinations of relationships, drugs and genes that are most associated with the gold standard DDIs, correctly identifying 79.8% of assertions relating interacting drug pairs and 78.9% of assertions relating noninteracting drug pairs. Most significantly, because our text processing method captures the semantics of individual gene-drug relationships, we can construct mechanistic pharmacological explanations for the newly-proposed DDIs. We show how our classifier can be used to explain known DDIs and to uncover new DDIs that have not yet been reported.

show abstract

Transition from local to global phase synchrony in small world neural network and its possible implications for epilepsy

et al. 2005

View full text Add to dashboard Cite

Temporal correlations in the brain are thought to have very dichotomic roles. On one hand they are ubiquitously present in the healthy brain and are thought to underlie feature binding during information processing. On the other hand large scale synchronization is an underlying mechanism of epileptic seizures. In this paper we show a possible mechanism of transition to pathological coherence underlying seizure generation. We show that properties of phase synchronization in the 2-D lattice of non-identical coupled Hindmarsh-Rose neurons change radically depending on the connectivity structure. We modify the connectivity using the small world network paradigm and measure properties of phase synchronization using previously developed measure based on assessment of the distributions of relative interspike intervals [1]. We show that the phase synchronization undergoes a dramatic change as a function of locality of network connections from local coherence strongly dependent on the distance between two neurons to global coherence exhibiting stronger phase locking and spanning the whole network. Epilepsy is one of the most common neurological disorders, with underlying seizures generated by indiscriminate, synchronized bursting of multiple cells in the brain [2], leading to the increased level of coherence in the recorded signal between individual neurons as well as whole networks [3,4]. There is a wide range of molecular and cellular mechanisms underlying seizure generation; however, they are often linked to increased excitatory transmission mediated by NMDA, AMPA or metabotropic glutamate receptors, and a decrease in inhibitory (GABAergic) transmission, causing an imbalance between excitation and inhibition in the system [5]. One of the mechanisms generating the changes of the excitatory transmission under pathological conditions is axonal sprouting [6,7]. This mechanism involves excessive growth of excitatory processes within an area that was exposed to ischemia or physical trauma, causing (in time) generation of seizures. We hypothesize that hyperexcitability induced by sprouting could be only one of the causes of seizures and show that alteration of network structure through introduction of random long-range connectivity in the network produces relatively abrupt transition in phase coherence in the 2-D small world network (SWN) lattice of non-identical Hindmarsh-Rose models of thalamocortical neurons [8].Emergence of the concept of small-world networks [9] has allowed for rigorous study of the properties of intermediate structured network where the connectivities are neither entirely regular not entirely random. Networks exhibiting such structure have been identified in social as well as biological systems [9,10]. Most studies have concentrated on their static properties [11,12,13]. However, recent work has also focused on the dynamic properties of SWN, including synchronization. It has been shown that the linear stability of the synchronous state is linked to the algebraic condition of the Laplacian matrix defini...

show abstract

Measures of Sexual Partnerships: Lengths, Gaps, Overlaps, and Sexually Transmitted Infection

Foxman

Newman

Percha

et al. 2006

View full text Add to dashboard Cite

show abstract

A global network of biomedical relationships derived from text

Percha

Altman

2018

111

View full text Add to dashboard Cite

MotivationThe biomedical community’s collective understanding of how chemicals, genes and phenotypes interact is distributed across the text of over 24 million research articles. These interactions offer insights into the mechanisms behind higher order biochemical phenomena, such as drug-drug interactions and variations in drug response across individuals. To assist their curation at scale, we must understand what relationship types are possible and map unstructured natural language descriptions onto these structured classes. We used NCBI’s PubTator annotations to identify instances of chemical, gene and disease names in Medline abstracts and applied the Stanford dependency parser to find connecting dependency paths between pairs of entities in single sentences. We combined a published ensemble biclustering algorithm (EBC) with hierarchical clustering to group the dependency paths into semantically-related categories, which we annotated with labels, or ‘themes’ (‘inhibition’ and ‘activation’, for example). We evaluated our theme assignments against six human-curated databases: DrugBank, Reactome, SIDER, the Therapeutic Target Database, OMIM and PharmGKB.ResultsClustering revealed 10 broad themes for chemical-gene relationships, 7 for chemical-disease, 10 for gene-disease and 9 for gene–gene. In most cases, enriched themes corresponded directly to known database relationships. Our final dataset, represented as a network, contained 37 491 thematically-labeled chemical-gene edges, 2 021 192 chemical-disease edges, 136 206 gene-disease edges and 41 418 gene–gene edges, each representing a single-sentence description of an interaction from somewhere in the literature.Availability and implementationThe complete network is available on Zenodo (https://zenodo.org/record/1035500). We have also provided the full set of dependency paths connecting biomedical entities in Medline abstracts, with associated sentences, for future use by the biomedical research community.Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Individuals with Down syndrome hospitalized with COVID-19 have more severe disease

Malle

Gao

Hur

et al. 2021

Genetics in Medicine

View full text Add to dashboard Cite

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Bethany Percha

Machine Learning to Predict Mortality and Critical Events in a Cohort of Patients With COVID-19 in New York City: Model Development and Validation

Informatics confronts drug–drug interactions

Deep learning predicts hip fracture using confounding patient and healthcare variables

Discovery and Explanation of Drug-Drug Interactions via Text Mining

Transition from local to global phase synchrony in small world neural network and its possible implications for epilepsy

Measures of Sexual Partnerships: Lengths, Gaps, Overlaps, and Sexually Transmitted Infection

A global network of biomedical relationships derived from text

Individuals with Down syndrome hospitalized with COVID-19 have more severe disease

Contact Info

Product

Resources

About