2023
DOI: 10.1101/2023.03.02.23286522
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Approach to Machine Learning for Extraction of Real-World Data Variables from Electronic Health Records

Abstract: BackgroundAs artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability.MethodsWe applied NLP with ML techniqu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 29 publications
0
8
0
Order By: Relevance
“…Each of the 18 variables has been extracted through NLP of clinical notes, followed by an advanced ML or deep learning model, including LSTM and XGBoost, after undergoing a rigorous development, validation, and testing process that aligns with the data and the model’s objectives. Model details, such as how they were developed, have been previously described [ 17 ]. Briefly, models are trained on the data labeled by expert abstraction to recognize, interpret, and curate free text into structured variable values in order to mimic the abstraction process.…”
Section: Methodsmentioning
confidence: 99%
“…Each of the 18 variables has been extracted through NLP of clinical notes, followed by an advanced ML or deep learning model, including LSTM and XGBoost, after undergoing a rigorous development, validation, and testing process that aligns with the data and the model’s objectives. Model details, such as how they were developed, have been previously described [ 17 ]. Briefly, models are trained on the data labeled by expert abstraction to recognize, interpret, and curate free text into structured variable values in order to mimic the abstraction process.…”
Section: Methodsmentioning
confidence: 99%
“…to output a set of structured variables required for real-world data analysis. The authors concluded that NLP enabled the extraction of retrospective clinical data from the EMR faster and more efficiently [27]. The authors did not however directly compare their model to human data extractors - their purpose was to show success in building an AI model that extracts accurate information from the EMR.…”
Section: Discussionmentioning
confidence: 99%
“…To our knowledge, there are no other studies that directly compared free-text data extraction from the EMR between AI chatbots and human data extractors. Adamson et al [27] applied NLP to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for real-world data analysis.…”
Section: Discussionmentioning
confidence: 99%
“…Retrospective longitudinal clinical data were derived from electronic health record data, comprising patient-level structured and unstructured data, curated via technology-enabled abstraction, and were linked to genomic data derived from FMI CGP tests in the FH-FMI CGDB by deidentified, deterministic matching . Patient smoking status was extracted by natural language processing of electronic health record documents . In this cohort, overall survival (OS) with routine clinical treatment was calculated from start of treatment in the metastatic setting to death from any cause, and patients without a record of mortality were right censored at the date of their last clinic visit or structured activity.…”
Section: Methodsmentioning
confidence: 99%