2019
DOI: 10.1007/978-3-030-20893-6_7
|View full text |Cite
|
Sign up to set email alerts
|

Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings

Abstract: We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and physical properties of 3D shapes such as color and shape. To evaluate our approach, we collect a large dataset of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
106
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 99 publications
(116 citation statements)
references
References 39 publications
(92 reference statements)
1
106
0
Order By: Relevance
“…Hu et al [2] use a similar approach albeit using deep neural networks to avoid parsing and feature construction by hand. Chen et al [5] learn joint embeddings of language descriptions and colored 3D objects. These approaches are trained on image data collected from internet sources and may differ from data observed on a robot's camera.…”
Section: Background and Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Hu et al [2] use a similar approach albeit using deep neural networks to avoid parsing and feature construction by hand. Chen et al [5] learn joint embeddings of language descriptions and colored 3D objects. These approaches are trained on image data collected from internet sources and may differ from data observed on a robot's camera.…”
Section: Background and Related Workmentioning
confidence: 99%
“…One of the critical challenges in robotics is scaling techniques to work in a wide array of environments. While the results presented in this paper constitute an important step towards grounding objects to natural language, existing work in this area still generally only applies to a small number, 1 -4, of object classes [5], [25]. Moving forward, more expansive datasets, of both natural language object descriptions and 3D shapes, are necessary to enable larger systems to be learned.…”
Section: Future Directionsmentioning
confidence: 99%
“…Then a long-term recurrent convolutional network (LRCN) was employed to refine the generated results to obtain more complete 3D models in higher resolution. Chen et al [29] proposed text2shape system which combined 3D generation with natural language processing. The network encoded the text, then regarded the results as a condition, and utilized WGAN to decode it into a 3D model related to input text.…”
Section: D Gansmentioning
confidence: 99%
“…The priors are not specific to individual shapes. Chen et al [11] gather natural language descriptions for 3D shapes that sometimes include material labels ("This is a brown wooden chair"), but there is no fine-grained region labeling that can be used for training. Yang et al [48] propose a data-driven algorithm to reshape shape components to a target fabrication material.…”
Section: Previous Workmentioning
confidence: 99%