X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

Ma, Yiwei; Zhang, Xiaioqing; Sun, Xingming; Ji, Jiayi; Wang, Haowei; Jiang, Guannan; Zhuang, Weilin; Ji, Rongrong

doi:10.48550/arxiv.2303.15764

Cited by 1 publication

(1 citation statement)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The Panoptic Narrative Grounding (PNG) task is rapidly gaining prominence as a critical area of research in the multimodal domain [11,36,37,52,58,59]. This task aims to generate a pixel-level mask for each noun present in a given long sentence, providing a more fine-grained understanding compared to other cross-modal tasks, such as image captioning [6,35,42,51,62], visual question answering [23,47,57,73], and referring expression comprehension/segmentation [5,19,[28][29][30]33].…”

Section: Introductionmentioning

confidence: 99%

Semi-Supervised Panoptic Narrative Grounding

Yang,

Ji,

Sun

et al. 2023

Proceedings of the 31st ACM International Conference on Multimedia

View full text Add to dashboard Cite

Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG) remains hindered by costly annotations. In this paper, we introduce a novel Semi-Supervised Panoptic Narrative Grounding (SS-PNG) learning scheme, capitalizing on a smaller set of labeled image-text pairs and a larger set of unlabeled pairs to achieve competitive performance. Unlike visual segmentation tasks, PNG involves one pixel belonging to multiple open-ended nouns. As a result, existing multi-class based semi-supervised segmentation frameworks cannot be directly applied to this task. To address this challenge, we first develop a novel SS-PNG Network (SS-PNG-NW) tailored to the SS-PNG setting. We thoroughly investigate strategies such as Burn-In and data augmentation to determine the optimal generic configuration for the SS-PNG-NW. Additionally, to tackle the issue of imbalanced pseudo-label quality,

show abstract