2022
DOI: 10.1186/s13326-022-00269-1
|View full text |Cite
|
Sign up to set email alerts
|

SemClinBr - a multi-institutional and multi-specialty semantically annotated corpus for Portuguese clinical NLP tasks

Abstract: Background The high volume of research focusing on extracting patient information from electronic health records (EHRs) has led to an increase in the demand for annotated corpora, which are a precious resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multipurpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 52 publications
0
3
0
Order By: Relevance
“…We first analyzed the F1 score and accuracy calculated by the test set of the Mac-Morpho corpus to verify how accurate the model performed in texts from the same corpus of the training. Also, we evaluated the trained models on a set of clinical notes taken from SemClinBr [15], a corpus containing clinical narratives from Brazilian hospitals. We randomly selected 50 sentences containing between 6 and 15 tokens, which were manually POS-annotated by a human linguist, referred to in this paper as human annotation.…”
Section: Discussionmentioning
confidence: 99%
“…We first analyzed the F1 score and accuracy calculated by the test set of the Mac-Morpho corpus to verify how accurate the model performed in texts from the same corpus of the training. Also, we evaluated the trained models on a set of clinical notes taken from SemClinBr [15], a corpus containing clinical narratives from Brazilian hospitals. We randomly selected 50 sentences containing between 6 and 15 tokens, which were manually POS-annotated by a human linguist, referred to in this paper as human annotation.…”
Section: Discussionmentioning
confidence: 99%
“…The checkpoints (intermediate saved versions of a pre-trained language model during the training process) involved the BERT-based models available for Portuguese, both generic domain and specialized in the clinical area. For each pre-trained model, we fine-tuned them to the NER task with two corpora in the clinical domain, TempClinBr [11], and SemClinBr [12].…”
Section: Methodsmentioning
confidence: 99%
“…Despite the advancement of transfer learning for negation detection [76,79,80], rule-based [27] and supervised machine learning approaches [76,77,[81][82][83] for LoE continue to be researched and employed. One paper presented a corpus-free approach, which is an attractive prospect in a scenario where there is no annotated data [84].…”
Section: Recent Advances In Negation Resolution For Loementioning
confidence: 99%