2021
DOI: 10.2196/25530
|View full text |Cite
|
Sign up to set email alerts
|

Similarity-Based Unsupervised Spelling Correction Using BioWordVec: Development and Usability Study of Bacterial Culture and Antimicrobial Susceptibility Reports

Abstract: Background Existing bacterial culture test results for infectious diseases are written in unrefined text, resulting in many problems, including typographical errors and stop words. Effective spelling correction processes are needed to ensure the accuracy and reliability of data for the study of infectious diseases, including medical terminology extraction. If a dictionary is established, spelling algorithms using edit distance are efficient. However, in the absence of a dictionary, traditional spel… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 11 publications
0
1
0
Order By: Relevance
“…To overcome these limitations, several alternative solutions have been addressed this year by researchers: fine-tuning of existing models [58][59][60][61][62][63], domain adaptation [43,64], transfer learning [48,50,60], self-training [43]. To go further in these directions, new trends were also observed in 2021, like reuse of older architectures based on fastText [65] and word2vec [66] enriched with basic language information: orthographic and lexical [67], syntactic-semantic classes [68], medical knowledge [46], subword embeddings [14,69], vector retrofitting [67,68]. Note also that multi-task systems were also proposed and can satisfy several NLP tasks adapted to the medical area [41].…”
Section: Language Modelsmentioning
confidence: 99%
“…To overcome these limitations, several alternative solutions have been addressed this year by researchers: fine-tuning of existing models [58][59][60][61][62][63], domain adaptation [43,64], transfer learning [48,50,60], self-training [43]. To go further in these directions, new trends were also observed in 2021, like reuse of older architectures based on fastText [65] and word2vec [66] enriched with basic language information: orthographic and lexical [67], syntactic-semantic classes [68], medical knowledge [46], subword embeddings [14,69], vector retrofitting [67,68]. Note also that multi-task systems were also proposed and can satisfy several NLP tasks adapted to the medical area [41].…”
Section: Language Modelsmentioning
confidence: 99%