2019
DOI: 10.1186/s13326-019-0214-4
|View full text |Cite
|
Sign up to set email alerts
|

Natural language processing for disease phenotyping in UK primary care records for research: a pilot study in myocardial infarction and death

Abstract: BackgroundFree text in electronic health records (EHR) may contain additional phenotypic information beyond structured (coded) information. For major health events – heart attack and death – there is a lack of studies evaluating the extent to which free text in the primary care record might add information. Our objectives were to describe the contribution of free text in primary care to the recording of information about myocardial infarction (MI), including subtype, left ventricular function, laboratory resul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(25 citation statements)
references
References 23 publications
0
25
0
Order By: Relevance
“…The simplest form of information extraction was manual review of free text ( 40 45 ) or a keyword search for relevant information, which was then modified by an algorithm ( 46 ) or manual review ( 47 49 ) to check for negation and uncertainty. Shah et al described a bespoke algorithm for converting general practice text data into categorical data for analysis ( 50 ). Papers drawn from the SLAM CRIS database used a bespoke set of data extraction techniques, supplied in the CRIS system using the Generalized Architecture for Text Engineering (GATE) ( 51 ) which allows integration of a range of NLP algorithms for specific purposes, such as identification of medications or Mini-Mental State Examination (MMSE) scores for dementia.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The simplest form of information extraction was manual review of free text ( 40 45 ) or a keyword search for relevant information, which was then modified by an algorithm ( 46 ) or manual review ( 47 49 ) to check for negation and uncertainty. Shah et al described a bespoke algorithm for converting general practice text data into categorical data for analysis ( 50 ). Papers drawn from the SLAM CRIS database used a bespoke set of data extraction techniques, supplied in the CRIS system using the Generalized Architecture for Text Engineering (GATE) ( 51 ) which allows integration of a range of NLP algorithms for specific purposes, such as identification of medications or Mini-Mental State Examination (MMSE) scores for dementia.…”
Section: Resultsmentioning
confidence: 99%
“…Medication information was often extracted from free text, particularly in studies using the SLaM BRC CRIS case register ( 57 , 60 , 63 , 64 , 66 73 ). Also extracted from free text were disease symptoms and drug reactions ( 46 , 47 , 50 , 60 , 67 , 72 , 74 76 ); test scores, such as for the MMSE ( 58 , 59 , 61 , 63 , 64 , 77 , 78 ), and angiogram results ( 50 ); treatments such as cognitive behavioral therapy (CBT) ( 60 , 79 ); substance use behaviors such as cannabis ( 49 , 72 , 80 ), alcohol ( 43 , 44 , 49 ) or smoking status ( 81 ); housing status ( 45 ); and information on symptom severity and functional status ( 61 , 73 ).…”
Section: Resultsmentioning
confidence: 99%
“…Such clinical scenarios include classification tasks related to medical imaging 18 or the natural language processing of freetext health records. 19 Such research should be reported, transparently and according to consistent reporting standards, such as those that build on the TRIPOD guidelines for prognostic studies. 20 Our machine learning models would select patients for discharge with around a 1 in 26 chance of subsequently deteriorating.…”
Section: Discussionmentioning
confidence: 99%
“…Lifestyle information, in particular, can often be missing from these records, but completeness has been improving over recent years. Performance of models could be improved by including information from the unstructured free text within the EHR [ 29 ] but access to this is increasingly difficult for researchers in the UK due to information governance restrictions. The prediction models have been derived using retrospective data and are limited in their application at the individual level to identify those at high risk.…”
Section: Discussionmentioning
confidence: 99%