2019
DOI: 10.1186/s13326-019-0216-2
|View full text |Cite
|
Sign up to set email alerts
|

Combining string and phonetic similarity matching to identify misspelt names of drugs in medical records written in Portuguese

Abstract: BackgroundThere is an increasing amount of unstructured medical data that can be analysed for different purposes. However, information extraction from free text data may be particularly inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, coupled with a supporting dictionary. However, they are not rich enough to encode both typing and phonetic misspellings.ResultsExperimental results showed a joint string and language-depende… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
1

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…In this work, we focus on clinical datasets 1 obtained from InfoSaude (InfoHealth) [5], an EHR system. An overview of each dataset is presented below and statistics are depicted in Tables I and II -in both datasets, all relations have the domain patient as head type.…”
Section: Methodsmentioning
confidence: 99%
“…In this work, we focus on clinical datasets 1 obtained from InfoSaude (InfoHealth) [5], an EHR system. An overview of each dataset is presented below and statistics are depicted in Tables I and II -in both datasets, all relations have the domain patient as head type.…”
Section: Methodsmentioning
confidence: 99%
“…Moreover, Florianópolis has been using electronic medical records over the last 20 years, from which over 80% of the health network data being stored in digital format since 2008. InfoSaude [16], [17] is an Electronic Health Record (EHR) system created to manage and track medical records used to meet the needs of Florianópolis' 75 public health centers, integrating patient EHRs with multiple information structures, such as distinct types of care, pregnancies, procedures performed on each patient, applied vaccines and drug prescriptions.…”
Section: A Motivationmentioning
confidence: 99%
“…Very simple regular expressions were able to extract the most common facts as well as the language signals for eventual negations. Finally, a hybrid phonetic similarity algorithm [17], [21] was used to find and merge misspellings, positively identified and manually checked for about 1.5% of the mentions in each considered sets of substances and infections.…”
Section: B Datasetmentioning
confidence: 99%
“…There has been interest in using NLP to develop computable algorithms from free text trial descriptions [33]- [36], and an 'eligibility criteria representation language' has been proposed [37]. However, unless EHR data sources are standardised, it is a major task to enable complex queries to run on disparate data sources [38]. Representation of time constraints also needs to be taken into account [39].…”
Section: A Improving Efficiency Of Clinical Trialsmentioning
confidence: 99%