Emily Alsentzer scite author profile

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months. However, these models have been minimally explored on specialty corpora, such as clinical text; moreover, in the clinical domain, no publicly-available pre-trained BERT models yet exist. In this work, we address this need by exploring and releasing BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically. We demonstrate that using a domain-specific model yields performance improvements on three common clinical NLP tasks as compared to nonspecific embeddings. These domainspecific models are not as performant on two clinical de-identification tasks, and argue that this is a natural consequence of the differences between de-identified source text and synthetically non de-identified task text.

show abstract

The effect of microbial colonization on the host proteome varies by gastrointestinal location

Lichtman

Alsentzer

Jaffe

et al. 2015

View full text Add to dashboard Cite

Endogenous intestinal microbiota have wide-ranging and largely uncharacterized effects on host physiology. Here, we used reverse-phase liquid chromatography-coupled tandem mass spectrometry to define the mouse intestinal proteome in the stomach, jejunum, ileum, cecum and proximal colon under three colonization states: germ-free (GF), monocolonized with Bacteroides thetaiotaomicron and conventionally raised (CR). Our analysis revealed distinct proteomic abundance profiles along the gastrointestinal (GI) tract. Unsupervised clustering showed that host protein abundance primarily depended on GI location rather than colonization state and specific proteins and functions that defined these locations were identified by random forest classifications. K-means clustering of protein abundance across locations revealed substantial differences in host protein production between CR mice relative to GF and monocolonized mice. Finally, comparison with fecal proteomic data sets suggested that the identities of stool proteins are not biased to any region of the GI tract, but are substantially impacted by the microbiota in the distal colon.

show abstract

Simulation of undiagnosed patients with novel genetic conditions

Alsentzer

Finlayson

et al. 2022

Preprint

View full text Add to dashboard Cite

Rare Mendelian disorders pose a major diagnostic challenge and collectively affect 300-400 million patients worldwide. Many automated tools aim to uncover causal genes in patients with suspected genetic disorders, but evaluation of these tools is limited due to the lack of comprehensive benchmark datasets that include previously unpublished conditions. In this chapter, we present a computational pipeline that simulates realistic clinical datasets to address this deficit. Our framework jointly simulates complex phenotypes and challenging candidate genes and produces patients with novel genetic conditions. We demonstrate the similarity of our simulated patients to real patients from the Undiagnosed Diseases Network and evaluate common gene prioritization methods on the simulated cohort. These prioritization methods recover known gene-disease associations but perform poorly on diagnosing patients with novel genetic disorders. Our publicly-available dataset and codebase can be utilized by medical genetics researchers to evaluate, compare, and improve tools that aid in the diagnostic process.

show abstract

What’s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization

Adams¹,

Alsentzer²,

Ketenci³

et al. 2021

View full text Add to dashboard Cite

Summarization of clinical narratives is a longstanding research problem. Here, we introduce the task of hospital-course summarization. Given the documentation authored throughout a patient's hospitalization, generate a paragraph that tells the story of the patient admission. We construct an English, textto-text dataset of 109,000 hospitalizations (2M source notes) and their corresponding summary proxy: the clinician-authored "Brief Hospital Course" paragraph written as part of a discharge note. Exploratory analyses reveal that the BHC paragraphs are highly abstractive with some long extracted fragments; are concise yet comprehensive; differ in style and content organization from the source notes; exhibit minimal lexical cohesion; and represent silver-standard references. Our analysis identifies multiple implications for modeling this complex, multi-document summarization task.

show abstract

Investigating inequities in hospital care among lesbian, gay, bisexual, and transgender (LGBT) individuals using social media

Hswen

Sewalk

Alsentzer

et al. 2018

Social Science & Medicine

View full text Add to dashboard Cite

Intimate Partner Violence and Injury Prediction From Radiology Reports

Chen

Alsentzer

Park

et al. 2020

View full text Add to dashboard Cite

An algorithm developed using the Brighton Collaboration case definitions is more efficient for determining diagnostic certainty

Joshi

Alsentzer

Edwards

et al. 2014

Vaccine

View full text Add to dashboard Cite

Extractive Summarization of EHR Discharge Notes

Alsentzer¹,

Kim²

2018

Preprint

View full text Add to dashboard Cite

Patient summarization is essential for clinicians to provide coordinated care and practice effective communication. Automated summarization has the potential to save time, standardize notes, aid clinical decision making, and reduce medical errors. Here we provide an upper bound on extractive summarization of discharge notes and develop an LSTM model to sequentially label topics of history of present illness notes. We achieve an F1 score of 0.876, which indicates that this model can be employed to create a dataset for evaluation of extractive summarization methods.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Emily Alsentzer

Publicly Available Clinical

The effect of microbial colonization on the host proteome varies by gastrointestinal location

Simulation of undiagnosed patients with novel genetic conditions

What’s in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization

Investigating inequities in hospital care among lesbian, gay, bisexual, and transgender (LGBT) individuals using social media

Intimate Partner Violence and Injury Prediction From Radiology Reports

An algorithm developed using the Brighton Collaboration case definitions is more efficient for determining diagnostic certainty

Extractive Summarization of EHR Discharge Notes

Contact Info

Product

Resources

About