2020
DOI: 10.3390/info11100484
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese

Abstract: Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(13 citation statements)
references
References 23 publications
0
2
0
1
Order By: Relevance
“…For this purpose, TextBenDS proposes a tweet-based data model and two types of workloads, namely Top-K keywords and Top-K documents operations. Other purely textual benchmarks focus on language analysis tasks, e.g., Chinese [25] and Portuguese [5] text recognition, respectively.…”
Section: Textual Benchmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…For this purpose, TextBenDS proposes a tweet-based data model and two types of workloads, namely Top-K keywords and Top-K documents operations. Other purely textual benchmarks focus on language analysis tasks, e.g., Chinese [25] and Portuguese [5] text recognition, respectively.…”
Section: Textual Benchmarksmentioning
confidence: 99%
“…Thus, we provide instead a script that extracts a user-defined amount of documents. This script and a usage guide are available online for reuse 5 . Amongst all available documents in HAL, we restrict to scientific articles whose length is homogeneous, which amounts to 50,000 documents.…”
Section: Data Extractionmentioning
confidence: 99%
“…The models developed during ASSIN 2 used more recent NLP approaches, including contextual embeddings like BERT [7]. Recent works addressing these datasets have since been continuously proposed, with the state-of-the-art being a BERT model pre-trained in Portuguese fine-tuned for STS [8].…”
Section: Introductionmentioning
confidence: 99%
“…The goal of this work is to evaluate contextual embeddings generated by SBERT models for STS in Portuguese, which we investigate in two stages. First, we compare the performance of pre-trained SBERT models with the state-ofthe-art BERT models for the ASSIN datasets [8]. In addition, we include other baseline models, such as the best-performing works assessed in the workshops and other multilingual contextual embeddings.…”
Section: Introductionmentioning
confidence: 99%
“…Resultados de avaliações intrínsecas de embeddings de palavras indicam que os vetores podem relacionar-se de diferentes formas, ora por assunto ("mouse" e "teclado"), ora por proximidade de uso ("leite" e "condensado"), ou não apresentarem relação de sentido aparente, o que deixa a dúvida se modelos que geram estas representações refletem algum tipo de relação de sentido de forma sistemática. Resultados de avaliações intrínsecas e extrínsecas também indicam que há diferenças significativas nos resultados obtidos a partir de diferentes datasets, bem como de diferentes modelos de geração de vetores (ANTONIAK; MIMNO, 2018;SINOARA;ROSSI;REZENDE, 2016;SCHNABEL et al, 2015;FIALHO;COHEUR;QUARESMA, 2020).…”
Section: Contexto E Motivaçãounclassified