Exploration of text matching methods in Chinese disease Q&amp;A systems: A method using ensemble based on BERT and boosted tree models

Wu, Ziming; Liang, Jun; Zhang, Zhongan; Lei, Jianbo

doi:10.1016/j.jbi.2021.103683

Cited by 12 publications

(9 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The algorithm computes the edit distance when a part of the string is moved to another string instead of detecting a single change in the string. Previous studies have confirmed its effectiveness [22], [23], [26]. N-gram used a sequence of strings and measured the similarity of sub-sequence of n words from a given sequence text.…”

Section: ) Lexical-similarity Methodsmentioning

confidence: 96%

“…Many evaluation methods are presented in the literature, but the majority evaluate the method in terms of robustness, accuracy, and time. The common methods among performance metrics are matching accuracy, precision, recall, granularity, and F1-measure [8], [3], [23], [24]. Contrary to accuracy, several studies analyze the performance with respect to the error metrics by employing root mean square error (RMSE) and failed detection ratio (FDR) [11], [25].…”

Section: Research Issuesmentioning

confidence: 99%

“…The source of unstructured information can be found from progress notes, discharge summaries, pathology, or radiology reports. Another application in the medical domain is the development of automatic Q&A systems using text matching [23]. The syntax and meaning of medical words are more complex, with many complicated Fields of Applications Plagiarism detection system ( [12], [18], [24], [28], [29], [31]) Topic identification ( [32], [33], [34]) Medical domain ( [23], [35], [36])…”

Section: Fig 3 the Application Areas Of Text Similarity Matchingmentioning

confidence: 99%

See 2 more Smart Citations

Revisiting the challenges and surveys in text similarity matching and detection methods

2022

View full text Add to dashboard Cite

The massive amount of information from the internet has revolutionized the field of natural language processing. One of the challenges was estimating the similarity between texts. This has been an open research problem although various studies have proposed new methods over the years. This paper surveyed and traced the primary studies in the field of text similarity. The aim was to give a broad overview of existing issues, applications, and methods of text similarity research. This paper identified four issues and several applications of text similarity matching. It classified current studies based on intrinsic, extrinsic, and hybrid approaches. Then, we identified the methods and classified them into lexical-similarity, syntactic-similarity, semanticsimilarity, structural-similarity, and hybrid. Furthermore, this study also analyzed and discussed method improvement, current limitations, and open challenges on this topic for future research directions. As the results, this paper highlighted the importance of selecting the appropriate preprocessing algorithms to reduce data dimensionality and also combining several algorithms to enhance the overall matching and detection process.

show abstract

Section: ) Lexical-similarity Methodsmentioning

confidence: 96%

Section: Research Issuesmentioning

confidence: 99%

Section: Fig 3 the Application Areas Of Text Similarity Matchingmentioning

confidence: 99%

See 1 more Smart Citation

Revisiting the challenges and surveys in text similarity matching and detection methods

2022

View full text Add to dashboard Cite

show abstract

“…We can observe that, in 2021, researchers mainly concentrated on studying English-language data. Indeed, compared to previous years, a fewer number of languages were covered: Chinese [3][4][5][6][7][8][9][10], Dutch [11], French [12,13], Italian [14][15][16], Japanese [17], Korean [18,19], Norwegian [20], and Spanish . Besides, except for Chinese, there were also very few works done for the languages represented in publications.…”

Section: Languages Addressedmentioning

confidence: 99%

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing

Grabar

Grouin

2022

Yearb Med Inform

View full text Add to dashboard Cite

Objectives: Analyze the content of publications within the medical natural language processing (NLP) domain in 2021. Methods: Automatic and manual preselection of publications to be reviewed, and selection of the best NLP papers of the year. Analysis of the important issues. Results: Four best papers have been selected in 2021. We also propose an analysis of the content of the NLP publications in 2021, all topics included. Conclusions: The main issues addressed in 2021 are related to the investigation of COVID-related questions and to the further adaptation and use of transformer models. Besides, the trends from the past years continue, such as information extraction and use of information from social networks.

show abstract

“…Studies such as [ 112 ] used the BERT network to evaluate different methods for a Q&A system trained on Chinese medical data. SCI-BERT [ 10 ], which leveraged unsupervised pre-training on a large multi-domain corpus of scientific publications, was introduced in and BioBERT, which was pre-trained on biomedical domain corpora (e.g., PubMed abstracts and PubMed Central full-text articles), was proposed in [ 54 ].…”

Section: Introductionmentioning

confidence: 99%

The HoPE Model Architecture: a Novel Approach to Pregnancy Information Retrieval Based on Conversational Agents

Montenegro

Costa

2022

J Healthc Inform Res

View full text Add to dashboard Cite

Conversational agents are used to communicating with humans in a friendly manner. To achieve the highest level of performance, agents need to respond assertively and fastly. Transformer architectures are shown to produce excellent performances on recent tasks; however, for tasks involving conversational agents, they may have a lower speed performance. The main goal of this study is to evaluate and propose a HoPE (Healthcare Obstetric in PrEgnancy) model that is tailored to pregnancy data. We carried out a dataset extraction and construction process based on collections of health documents related to breastfeeding, childcare, pregnant care, nutrition, risks, vaccines, exams, and physical exercises. We evaluated two pre-trained models in the Portuguese language for the conversational agent architecture proposal and chose the one with the best performance to compose the HoPE architecture. The BERTimbau model, which has been trained on data augmentation strategies, proves to be able to retrieve information quickly and most accurately than others. For the fine-tuning process, we achieved a Spearman correlation of 95.55 on BERTimbau augmented with a few pairs (1.500 pairs). The HoPE model architecture achieved an F1-Score of 0.89, outperforming other combinations tested in this study. We will evaluate this approach for clinical studies in future studies.

show abstract

Exploration of text matching methods in Chinese disease Q&A systems: A method using ensemble based on BERT and boosted tree models

Cited by 12 publications

References 8 publications

Revisiting the challenges and surveys in text similarity matching and detection methods

Revisiting the challenges and surveys in text similarity matching and detection methods

Year 2021: COVID-19, Information Extraction and BERTization among the Hottest Topics in Medical Natural Language Processing

The HoPE Model Architecture: a Novel Approach to Pregnancy Information Retrieval Based on Conversational Agents

Contact Info

Product

Resources

About