2023
DOI: 10.48550/arxiv.2303.15764
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance

Abstract: Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text. Prior methods adopt text-independent multilayer perceptrons (MLPs) to predict the attributes of the target mesh with the supervision of CLIP loss. However, such text-independent architecture lacks textual guidance during predicting attributes, thus leading to unsatisfactory stylization and slow convergence. To address these limitation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 46 publications
0
1
0
Order By: Relevance
“…The Panoptic Narrative Grounding (PNG) task is rapidly gaining prominence as a critical area of research in the multimodal domain [11,36,37,52,58,59]. This task aims to generate a pixel-level mask for each noun present in a given long sentence, providing a more fine-grained understanding compared to other cross-modal tasks, such as image captioning [6,35,42,51,62], visual question answering [23,47,57,73], and referring expression comprehension/segmentation [5,19,[28][29][30]33].…”
Section: Introductionmentioning
confidence: 99%
“…The Panoptic Narrative Grounding (PNG) task is rapidly gaining prominence as a critical area of research in the multimodal domain [11,36,37,52,58,59]. This task aims to generate a pixel-level mask for each noun present in a given long sentence, providing a more fine-grained understanding compared to other cross-modal tasks, such as image captioning [6,35,42,51,62], visual question answering [23,47,57,73], and referring expression comprehension/segmentation [5,19,[28][29][30]33].…”
Section: Introductionmentioning
confidence: 99%