2021
DOI: 10.33039/ami.2021.04.002
|View full text |Cite
|
Sign up to set email alerts
|

Abstractive text summarization for Hungarian

Abstract: In our research we have created a text summarization software tool for Hungarian using multilingual and Hungarian BERT-based models. Two types of text summarization method exist: abstractive and extractive. The abstractive summarization is more similar to human generated summarization. Target summaries may include phrases that the original text does not necessarily contain. This method generates the summarized text by applying keywords that were extracted from the original text. The extractive method summarize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…In addition, we propose a cross-lingual-transferbased approach to improve our results. Using pretrained multilingual BERT, we fine-tuned multilingual BERT for abstractive Hungarian text summarization using the HVG 4 corpus (Yang et al, 2021) where the articles and corresponding leads were taken from a daily online newspaper. We further fine-tuned this model for abstractive Arabic text summarization using our own corpus.…”
Section: Methodsmentioning
confidence: 99%
“…In addition, we propose a cross-lingual-transferbased approach to improve our results. Using pretrained multilingual BERT, we fine-tuned multilingual BERT for abstractive Hungarian text summarization using the HVG 4 corpus (Yang et al, 2021) where the articles and corresponding leads were taken from a daily online newspaper. We further fine-tuned this model for abstractive Arabic text summarization using our own corpus.…”
Section: Methodsmentioning
confidence: 99%
“…• Arabic 3BART: Following the cross-lingual approach we used in our previous research [10], the 3BART model was first fine-tuned on a multilingual summarization corpus containing a mixture of English and Hungarian segments, and then further fine-tuned on the AraSum corpus. The English segments were taken from the CNN / Daily Mail corpus [18], while the Hungarian segments were taken from the H+I corpus [25]. Hyperparameters: batch: 4/GPU, 8 GTX/RTX 11 GB GPU's, warmup: 5000, 80 epochs, max.…”
Section: Fine-tuningmentioning
confidence: 99%
“…For the summarization task, we used the H+I corpus that Yang et al used in their research [36], NOL (Népszabadság online corpus; nol.hu online articles (art) and its' leads from 1999 to 2016) and MARCELL [32] (law documents (doc) and its' one line descriptions (desc) from 1991 to 2019) corpora. Table 2 shows the characteristics of the fine-tuning corpora.…”
Section: Corporamentioning
confidence: 99%