Similarity-Based Unsupervised Spelling Correction Using BioWordVec: Development and Usability Study of Bacterial Culture and Antimicrobial Susceptibility Reports

Kim, Tae-Hyeong; Han, Sung Won; Kang, Minji; Lee, Se Ha; Kim, Jong Ho; Joo, Hyung Joon

doi:10.2196/25530

Cited by 3 publications

(1 citation statement)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To overcome these limitations, several alternative solutions have been addressed this year by researchers: fine-tuning of existing models [58][59][60][61][62][63], domain adaptation [43,64], transfer learning [48,50,60], self-training [43]. To go further in these directions, new trends were also observed in 2021, like reuse of older architectures based on fastText [65] and word2vec [66] enriched with basic language information: orthographic and lexical [67], syntactic-semantic classes [68], medical knowledge [46], subword embeddings [14,69], vector retrofitting [67,68]. Note also that multi-task systems were also proposed and can satisfy several NLP tasks adapted to the medical area [41].…”

Section: Language Modelsmentioning

confidence: 99%

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing

Grabar

Grouin

2022

Yearb Med Inform

View full text Add to dashboard Cite

Objectives: Analyze the content of publications within the medical natural language processing (NLP) domain in 2021. Methods: Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. Results: Four best papers have been selected in 2021. We also propose an analysis of the content of the NLP publications in 2021, all topics included. Conclusions: The main issues addressed in 2021 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as information extraction and use of information from social networks.

show abstract

Section: Language Modelsmentioning

confidence: 99%

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing

Grabar

Grouin

2022

Yearb Med Inform

View full text Add to dashboard Cite

show abstract

Using automated methods to detect safety problems with health information technology: a scoping review

Surian

Wang

Coiera

et al. 2022

Journal of the American Medical Informatics Association

View full text Add to dashboard Cite

Objective To summarize the research literature evaluating automated methods for early detection of safety problems with health information technology (HIT). Materials and Methods We searched bibliographic databases including MEDLINE, ACM Digital, Embase, CINAHL Complete, PsycINFO, and Web of Science from January 2010 to June 2021 for studies evaluating the performance of automated methods to detect HIT problems. HIT problems were reviewed using an existing classification for safety concerns. Automated methods were categorized into rule-based, statistical, and machine learning methods, and their performance in detecting HIT problems was assessed. The review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta Analyses extension for Scoping Reviews statement. Results Of the 45 studies identified, the majority (n = 27, 60%) focused on detecting use errors involving electronic health records and order entry systems. Machine learning (n = 22) and statistical modeling (n = 17) were the most common methods. Unsupervised learning was used to detect use errors in laboratory test results, prescriptions, and patient records while supervised learning was used to detect technical errors arising from hardware or software issues. Statistical modeling was used to detect use errors, unauthorized access, and clinical decision support system malfunctions while rule-based methods primarily focused on use errors. Conclusions A wide variety of rule-based, statistical, and machine learning methods have been applied to automate the detection of safety problems with HIT. Many opportunities remain to systematically study their application and effectiveness in real-world settings.

show abstract

Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings

Dehghani,

Faili

2023

Preprint

View full text Add to dashboard Cite

Spelling correction is a remarkable challenge in the field of natural language processing. The objective of spelling correction tasks is to recognize and rectify spelling errors automatically. The development of applications that can effectually diagnose and correct Persian spelling and grammatical errors has become more important in order to improve the quality of Persian text. The Typographical Error Type Detection in Persian is a relatively understudied area. Therefore, this paper presents a compelling approach for detecting typographical errors in Persian texts. Our work includes the presentation of a publicly available dataset called FarsTypo, which comprises 3.4 million words arranged in chronological order and tagged with their corresponding part-of-speech. These words cover a wide range of topics and linguistic styles. We develop an algorithm designed to apply Persian-specific errors to a scalable portion of these words, resulting in a parallel dataset of correct and incorrect words. By leveraging FarsTypo, we establish a strong foundation and conduct a thorough comparison of various methodologies employing different architectures. Additionally, we introduce a groundbreaking Deep Sequential Neural Network that utilizes both word and character embeddings, along with bidirectional LSTM layers, for token classification aimed at detecting typographical errors across 51 distinct classes. Our approach is contrasted with highly advanced industrial systems that, unlike this study, have been developed using a diverse range of resources. The outcomes of our final method proved to be highly competitive, achieving an accuracy of 97.62%, precision of 98.83%, recall of 98.61%, and surpassing others in terms of speed.

show abstract

Similarity-Based Unsupervised Spelling Correction Using BioWordVec: Development and Usability Study of Bacterial Culture and Antimicrobial Susceptibility Reports

Cited by 3 publications

References 11 publications

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing

Using automated methods to detect safety problems with health information technology: a scoping review

Persian Typographical Error Type Detection Using Deep Neural Networks on Algorithmically-Generated Misspellings

Contact Info

Product

Resources

About