Abstractive text summarization for Hungarian

Yang, Zijian; Agócs, Ádám; Kusper, Gábor; Váradi, Tamás

doi:10.33039/ami.2021.04.002

Cited by 6 publications

(3 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, we propose a cross-lingual-transferbased approach to improve our results. Using pretrained multilingual BERT, we fine-tuned multilingual BERT for abstractive Hungarian text summarization using the HVG 4 corpus (Yang et al, 2021) where the articles and corresponding leads were taken from a daily online newspaper. We further fine-tuned this model for abstractive Arabic text summarization using our own corpus.…”

Section: Methodsmentioning

confidence: 99%

Cross-lingual Fine-tuning for Abstractive Arabic Text Summarization

Kahla¹,

Yang²,

Novák

2021

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me

View full text Add to dashboard Cite

Section: Methodsmentioning

confidence: 99%

Cross-lingual Fine-tuning for Abstractive Arabic Text Summarization

Kahla¹,

Yang²,

Novák

2021

Proceedings of the Conference Recent Advances in Natural Language Processing - Deep Learning for Natural Language Processing Me

View full text Add to dashboard Cite

“…• Arabic 3BART: Following the cross-lingual approach we used in our previous research [10], the 3BART model was first fine-tuned on a multilingual summarization corpus containing a mixture of English and Hungarian segments, and then further fine-tuned on the AraSum corpus. The English segments were taken from the CNN / Daily Mail corpus [18], while the Hungarian segments were taken from the H+I corpus [25]. Hyperparameters: batch: 4/GPU, 8 GTX/RTX 11 GB GPU's, warmup: 5000, 80 epochs, max.…”

Section: Fine-tuningmentioning

confidence: 99%

Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language

Kahla¹,

Novák²,

Yang³

2023

AMI

View full text Add to dashboard Cite

The main task of our research is to train various abstractive summarization models for the Arabic language. The work for abstractive Arabic text summarization has hardly begun so far due to the unavailability of the datasets needed for that. In our previous research, we created the first monolingual corpus in the Arabic language for abstractive text summarization. Based on this corpus, we fine-tuned various transformer models. We tested the PreSumm and multilingual BART models. We achieved a “state of the art” result in this area with the PreSumm method. The present study continues the same series of research. We extended our corpus “AraSum” and managed to reach up to 50 thousand items, each consisting of an article and its corresponding lead. In addition, we pretrained our own monolingual and trilingual BART models for the Arabic language and fine-tuned them in addition to the mT5 model for abstractive text summarization for the same language, using the AraSum corpus. While there is room for improvement in the resources and the infrastructure we possess, the results clearly demonstrate that most of our models surpassed the XL-Sum which is considered to be state of the art for abstractive Arabic text summarization so far. Our corpus “AraSum” will be released to facilitate future work on abstractive Arabic text summarization.

show abstract

“…For the summarization task, we used the H+I corpus that Yang et al used in their research [36], NOL (Népszabadság online corpus; nol.hu online articles (art) and its' leads from 1999 to 2016) and MARCELL [32] (law documents (doc) and its' one line descriptions (desc) from 1991 to 2019) corpora. Table 2 shows the characteristics of the fine-tuning corpora.…”

Section: Corporamentioning

confidence: 99%

Solving Hungarian natural language processing tasks with multilingual generative models

Yang¹,

Laki²

2023

AMI

View full text Add to dashboard Cite

Generative ability is a crucial need for artificial intelligence applications, such as chatbots, virtual assistants, machine translation systems etc. In recent years, the transformer-based neural architectures gave a huge boost to generate human-like English texts. In our research we did experiments to create pre-trained generative transformer models for Hungarian language and fine-tune them for multiple types of natural language processing tasks.In our focus, multilingual models were trained. We have pre-trained a multilingual BART, then fine-tuned it to various NLP tasks, such as text classification, abstractive summarization. In our experiments, we focused on transfer learning techniques to increase the performance. Furthermore, a M2M100 multilingual model was fine-tuned for a 12-lingual Hungarian-Centric machine translation. Last but not least, a Marian NMT based machine translation system was also built from scratch for the 12-lingual Hungarian-Centric machine translation task.In our results, using the cross-lingual transfer method we could achieve higher performance in all of our tasks. In our machine translation experiment, using our fine-tuned M2M100 model we could outperform the Google Translate, Microsoft Translator and eTranslation.

show abstract

Abstractive text summarization for Hungarian

Cited by 6 publications

References 11 publications

Cross-lingual Fine-tuning for Abstractive Arabic Text Summarization

Cross-lingual Fine-tuning for Abstractive Arabic Text Summarization

Fine-tuning and multilingual pre-training for abstractive summarization task for the Arabic language

Solving Hungarian natural language processing tasks with multilingual generative models

Contact Info

Product

Resources

About