KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Xing, Yiran; Shi, Zai; Zeng, Meng; Lakemeyer, Gerhard; Ma, Yunpu; Wattenhofer, Roger

doi:10.18653/v1/2021.acl-long.44

Cited by 25 publications

(26 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their proposed approach has a better understanding of noise and can handle and understand complex queries. The authors of [ 79 ] performed visual common sense generation and called it Knowledge Enhanced Multimodal BART. The authors of [ 80 ] evaluated BART for knowledge grounded conversation tasks and achieved good results.…”

Section: Types Of Classification Algorithmmentioning

confidence: 99%

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification

Qasim

Bangyal

Alqarni

et al. 2022

Journal of Healthcare Engineering

View full text Add to dashboard Cite

Text Classification problem has been thoroughly studied in information retrieval problems and data mining tasks. It is beneficial in multiple tasks including medical diagnose health and care department, targeted marketing, entertainment industry, and group filtering processes. A recent innovation in both data mining and natural language processing gained the attention of researchers from all over the world to develop automated systems for text classification. NLP allows categorizing documents containing different texts. A huge amount of data is generated on social media sites through social media users. Three datasets have been used for experimental purposes including the COVID-19 fake news dataset, COVID-19 English tweet dataset, and extremist-non-extremist dataset which contain news blogs, posts, and tweets related to coronavirus and hate speech. Transfer learning approaches do not experiment on COVID-19 fake news and extremist-non-extremist datasets. Therefore, the proposed work applied transfer learning classification models on both these datasets to check the performance of transfer learning models. Models are trained and evaluated on the accuracy, precision, recall, and F1-score. Heat maps are also generated for every model. In the end, future directions are proposed.

show abstract

Section: Types Of Classification Algorithmmentioning

confidence: 99%

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification

Qasim

Bangyal

Alqarni

et al. 2022

Journal of Healthcare Engineering

View full text Add to dashboard Cite

show abstract

“…Correspondingly, many general pre-training tasks are proposed, such as Masked Language Modeling (MLM), Masked Region Modeling (MRM) and Image-Text Matching (ITM) Yu et al, 2021). Besides, in order to make the pre-trained models better understand downstream tasks, researchers also design task-specific pre-training models for different downstream tasks (Hao et al, 2020;Xing et al, 2021). In our work, apart from the popular general pre-training tasks, we also design three kinds of task-specific pre-training tasks for the MABSA task.…”

Section: Related Workmentioning

confidence: 99%

“…Masked Region Modeling (MRM). Following Xing et al (2021), our MRM task aims to predict the semantic class distribution of the masked region. As shown in Figure 1, for the input of the encoder, we randomly mask image regions with a probability of 15%, which are replaced with zero vectors.…”

Section: Visual Pre-trainingmentioning

confidence: 99%

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

Liu¹,

Yu²,

Xia³

2022

Preprint

View full text Add to dashboard Cite

As an important task in sentiment analysis, Multimodal Aspect-Based Sentiment Analysis (MABSA) has attracted increasing attention in recent years. However, previous approaches either (i) use separately pre-trained visual and textual models, which ignore the crossmodal alignment or (ii) use vision-language models pre-trained with general pre-training tasks, which are inadequate to identify finegrained aspects, opinions, and their alignments across modalities. To tackle these limitations, we propose a task-specific Vision-Language Pre-training framework for MABSA (VLP-MABSA), which is a unified multimodal encoder-decoder architecture for all the pretraining and downstream tasks. We further design three types of task-specific pre-training tasks from the language, vision, and multimodal modalities, respectively. Experimental results show that our approach generally outperforms the state-of-the-art approaches on three MABSA subtasks. Further analysis demonstrates the effectiveness of each pretraining task. The source code is publicly released at https://github.com/NUSTM/ VLP-MABSA.

show abstract

“…This again suggests that the model is incapable of understanding complex relatinships between vulnerable communities and ideas. A future interesting research avenue would explore methods incorporating relevant knowledge bases, similar to recent work on common sense generation (Xing et al, 2021), into transformer models to address these errors.…”

Section: False Positives and False Negativesmentioning

confidence: 99%

UTSA NLP at SemEval-2022 Task 4: An Exploration of Simple Ensembles of Transformers, Convolutional, and Recurrent Neural Networks

Xingmeng¹,

Rios²

2022

Preprint

View full text Add to dashboard Cite

The act of appearing kind or helpful via the use of but having a feeling of superiority condescending and patronizing language can have have serious mental health implications to those that experience it. Thus, detecting this condescending and patronizing language online can be useful for online moderation systems. Thus, in this manuscript, we describe the system developed by Team UTSA SemEval-2022 Task 4, Detecting Patronizing and Condescending Language. Our approach explores the use of several deep learning architectures including RoBERTa, convolutions neural networks, and Bidirectional Long Short-Term Memory Networks. Furthermore, we explore simple and effective methods to create ensembles of neural network models. Overall, we experimented with several ensemble models and found that the a simple combination of five RoBERTa models achieved an F-score of .6441 on the development dataset and .5745 1 on the final test dataset. Finally, we also performed a comprehensive error analysis to better understand the limitations of the model and provide ideas for further research.

show abstract

KM-BART: Knowledge Enhanced Multimodal BART for Visual Commonsense Generation

Cited by 25 publications

References 23 publications

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification

A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification

Vision-Language Pre-Training for Multimodal Aspect-Based Sentiment Analysis

UTSA NLP at SemEval-2022 Task 4: An Exploration of Simple Ensembles of Transformers, Convolutional, and Recurrent Neural Networks

Contact Info

Product

Resources

About