Nigam H. Shah scite author profile

The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.In the search for what is biologically and clinically significant in the swarms of data being generated by today's high-throughput technologies, a common strategy involves the creation and analysis of 'annotations' linking primary data to expressions in controlled, structured vocabularies, thereby making the data available to search and to algorithmic processing 1 . The most successful such endeavor, measured both by numbers of users and by reach across species and granularities, is the Gene Ontology (GO) 2 . There exist over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO3, of which half a million have been manually verified by specialist curators in different modelorganism communities on the basis of the analysis of experimental results reported in 52,000 scientific journal articles (http://www.ebi.ac.uk/GOA/). Data related to some 180,000 genes have been manually annotated in this way, an endeavor now being refined and systematized within the Reference Genome Project (US National Institutes of Health National Human Genome Research Institute grant 2P41HG002273-07), which will provide comprehensive GO annotations for both the human genome and a representative set of model-organism genomes in support of research on the primary molecular systems affecting human health. From retrospective mapping to prospective standardizationThe domain of molecular biology is marked by the availability of large amounts of well defined data that can be used without restriction as inputs to algorithmic processing. In the clinical domain, by contrast, only limited amounts of data are available for research purposes, and these still consist overwhelmingly of natural language text. Even where more systematic clinical data are available, the use of local coding schemes means that these data do not cumulate in ways useful to research 4 . One approach to solving this problem is the Unified Medical Language System (UMLS) 5 , a compendium of some 100 source vocabularies combined through a process of...

show abstract

Scalable and accurate deep learning with electronic health records

Rajkomar

et al. 2018

View full text Add to dashboard Cite

Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart.

show abstract

Rates of Co-infection Between SARS-CoV-2 and Other Respiratory Pathogens

Kim

Quinn

Pinsky

et al. 2020

JAMA

689

700

View full text Add to dashboard Cite

Background: Since December 2019, the coronavirus disease 2019 (COVID-19) has infected more than 12,310322,000 people and killed over 556,000 people worldwide. However, Differential diagnosis remains di cult for suspected cases of COVID-19 and need to be improved to reduce misdiagnosis. Methods: Sixty-eight cases of suspected COVID-19 treated in Wenzhou Central Hospital from January 21 to February 20, 2020 were divided into con rmed and COVID-19-negative groups based on the results of real-time reverse transcriptase polymerase chain reaction (RT-PCR) nucleic acid testing of the novel coronavirus in throat swab specimens to compare the clinical symptoms and laboratory and imaging results between the groups. Results: Among suspected patients, 17 were con rmed to COVID-19-positive group and 51 were distinguished to COVID-19-negative group. Patients with reduced white blood cell (WBC) count were more common in the COVID-19-positive group than in the COVID-19-negative group (29.4% vs 3.9%, P = 0.003). Subsequently, correlation analysis indicated that there was a signi cant inverse correlation existed between WBC count and temperature in the COVID-19-positive patients (r=-0.587, P=0.003), instead of the COVID-19-negative group. But reduced lymphocyte count was no different between the two groups (47.1% vs 25.5%, P= 0.096). More common chest imaging characteristics of the con rmed COVID-19 cases by high-resolution computed tomography (HRCT) included ground-glass opacities (GGOs), multiple patchy shadows, and consolidation with bilateral involvement than COVID-19-negative group (82.4% vs 31.4%, P=0.0002; 41.2% vs 17.6% vs P=0.048; 76.5% vs 43.1%, P=0.017; respectively). The rate of clustered infection was higher in COVID-19-positive group than COVID-19-negative group (64.7% vs 7.8%, P=0.001). Through multiplex PCR nucleic acid testing, 2 cases of in uenza A, 3 cases of in uenza B, 2 cases of adenovirus, 2 cases of Chlamydia pneumonia, and 7 cases of Mycoplasma pneumoniae were diagnosed in the COVID-19-negative group. Conclusions: WBC count inversely correlated with the severity of fever, GGOs, multiple patchy shadows, and consolidation in chest HRCT and clustered infection are common but not speci c features in the con rmed COVID-19 group.Reduced WBC count inversely correlating with the severity of fever, GGOs, multiple patchy shadows, and consolidation in chest HRCT and clustered infection are features in the con rmed COVID-19 group but not unique. Multiplex PCR nucleic acid testing helped differential diagnosis for suspected COVID-19 casesexclude pathogenic diagnosis in COVID-19 patients. Background Since December 2019, the epidemic of pneumonia caused by novel coronavirus in China, has continued to progress [1], having now infected more than 12,322,000 people and killed over 556,000 people worldwide [2]. On February 11, 2020, The International Committee on Taxonomy of Viruses o cially named this severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the World Health Organization (WHO) named the disease corona...

show abstract

Implementing Machine Learning in Health Care — Addressing Ethical Challenges

Shah²,

2018

View full text Add to dashboard Cite

The BioPAX community standard for pathway data sharing

Demir¹,

Cary²,

Paley³

et al. 2010

Nat Biotechnol

624

524

View full text Add to dashboard Cite

BioPAX (Biological Pathway Exchange) is a standard language to represent biological pathways at the molecular and cellular level. Its major use is to facilitate the exchange of pathway data (http://www.biopax.org). Pathway data captures our understanding of biological processes, but its rapid growth necessitates development of databases and computational tools to aid interpretation. However, the current fragmentation of pathway information across many databases with incompatible formats presents barriers to its effective use. BioPAX solves this problem by making pathway data substantially easier to collect, index, interpret and share. BioPAX can represent metabolic and signaling pathways, molecular and genetic interactions and gene regulation networks. BioPAX was created through a community process. Through BioPAX, millions of interactions organized into thousands of pathways across many organisms, from a growing number of sources, are available. Thus, large amounts of pathway data are available in a computable form to support visualization, analysis and biological discovery.

show abstract

Defining the features and duration of antibody responses to SARS-CoV-2 infection associated with disease severity and outcome

et al. 2020

View full text Add to dashboard Cite

SARS-CoV-2-specific antibodies, particularly those preventing viral spike receptor binding domain (RBD) interaction with host angiotensin-converting enzyme 2 (ACE2) receptor, can neutralize the virus. It is, however, unknown which features of the serological response may affect clinical outcomes of COVID-19 patients. We analyzed 983 longitudinal plasma samples from 79 hospitalized COVID-19 patients and 175 SARS-CoV-2-infected outpatients and asymptomatic individuals. Within this cohort, 25 patients died of their illness. Higher ratios of IgG antibodies targeting S1 or RBD domains of spike compared to nucleocapsid antigen were seen in outpatients who had mild illness versus severely ill patients. Plasma antibody increases correlated with decreases in viral RNAemia, but antibody responses in acute illness were insufficient to predict inpatient outcomes. Pseudovirus neutralization assays and a scalable ELISA measuring antibodies blocking RBD-ACE2 interaction were well correlated with patient IgG titers to RBD. Outpatient and asymptomatic individuals’ SARS-CoV-2 antibodies, including IgG, progressively decreased during observation up to five months post-infection.

show abstract

BioPortal: ontologies and integrated data resources at the click of a mouse

Noy¹,

Shah²,

Whetzel³

et al. 2009

Nucleic Acids Research

653

478

View full text Add to dashboard Cite

Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers ‘one-stop shopping’ to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.

show abstract

BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications

Whetzel

Noy

Shah

et al. 2011

Nucleic Acids Research

591

472

View full text Add to dashboard Cite

The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nigam H. Shah

The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration

Scalable and accurate deep learning with electronic health records

Rates of Co-infection Between SARS-CoV-2 and Other Respiratory Pathogens

Implementing Machine Learning in Health Care — Addressing Ethical Challenges

The BioPAX community standard for pathway data sharing

Defining the features and duration of antibody responses to SARS-CoV-2 infection associated with disease severity and outcome

BioPortal: ontologies and integrated data resources at the click of a mouse

BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications

Contact Info

Product

Resources

About