Gaze Assisted Visual Grounding

Johari, Kritika; Tong, Christopher Tay Zi; Subbaraju, Vigneshwaran; Kim, Jung‐Jae; Tan, U-Xuan

doi:10.1007/978-3-030-90525-5_17

Cited by 3 publications

(1 citation statement)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Consequently, the combination of novel categories of objects and complex referring expressions results in decreased performance on RefMD. Improvement in disambiguation and REC performance can be achieved by comparing and exploring different and more sophisticated disambiguation approaches, such as attribute-guided disambiguation [48], to improve the accuracy of grounding as well as by incorporating gesture [49] and gaze [50] information. While the adapted model comprehends the natural language object descriptions with 82% accuracy in the user study, the domain gap between the synthetic and real-world application can further be reduced by incorporating more variation and randomization [29] in the RefMD dataset.…”

Section: Discussion and Future Workmentioning

confidence: 99%

Manufacturing domain instruction comprehension using synthetic data

Johari,

Tong,

Bhardwaj

et al. 2024

Vis Comput

Self Cite

View full text Add to dashboard Cite

Referring Expression Comprehension (REC) system solves a task to localize objects in a given image, based on natural language expression. We propose a novel approach to adapting the pre-trained REC model for the manufacturing domain.Introduction: Despite significant advances in REC research, current REC datasets fail to recognize objects from specific yet important domains such as manufacturing due to the absence of domain-specific samples during training. Thus, we introduce a synthetic data-based domain adaptation approach for REC. Methods:To adapt a REC model to the manufacturing domain, we generated a synthetic REC dataset RefMD that consists of two sub-datasets: 1) dataset for manufacturing object classification, and 2) dataset for REC adaptation to manufacturing. Each dataset serves as one step toward the REC adaptation. Adaptation of the object classification network (visual backbone) is carried out by training ResNet50 on domain-specific labeled data, while the REC adaptation completes with the adaptation of modules in RealGIN altogether. The manufacturing domain-adapted model is further enhanced with the capability to handle ambiguous referring expressions through human-in-the-loop (HITL) interaction. Results:The experiments on 3D printed manufacturing objects demonstrate that the interactive REC model can accurately comprehend human instructions with an 82% accuracy. Conclusion:This paper introduces an approach to facilitate domain adaptation using solely synthetic data for the specific case of the manufacturing domain. However, the proposed adaptation methodology can be applied to any other domain by following the same synthetic data generation process.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

Manufacturing domain instruction comprehension using synthetic data

Johari,

Tong,

Bhardwaj

et al. 2024

Vis Comput

Self Cite

View full text Add to dashboard Cite

show abstract

GVGNet: Gaze-Directed Visual Grounding for Learning Under-Specified Object Referring Intention

Qian

Zhang

Song

et al. 2023

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

Manufacturing Domain Instruction Comprehension using Synthetic Data

Johari,

Tong,

Bhardwaj

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

Referring Expression Comprehension (REC) system solves a task to localize objects in a given image, based on natural language expression. We pro- pose a novel approach to adapting the pre-trained REC model for the manufac- turing domain. Introduction: Despite significant advances in REC research, current REC datasets fail to recognize objects from specific yet important domains such as manufac- turing due to the absence of domain-specific samples during training. Thus, we introduce a synthetic data-based domain adaptation approach for REC. Methods: To adapt a REC model to the manufacturing domain, we generated a synthetic REC dataset RefMD that consists of two sub-datasets: 1) dataset for manufacturing object classification, and 2) dataset for REC adaptation to man- ufacturing. Each dataset serves as one step toward the REC adaptation. Adap- tation of the object classification network (visual backbone) is carried out by training ResNet50 on domain-specific labeled data, while the REC adaptation completes with the adaptation of modules in RealGIN altogether. The manufac- turing domain-adapted model is further enhanced with the capability to handle ambiguous referring expressions through human-in-the-loop (HITL) interaction. Results: The experiments on 3D printed manufacturing objects demonstrate that the interactive REC model can accurately comprehend human instructions with an 82% accuracy. Conclusion: This paper introduces an approach to facilitate domain adaptation using solely synthetic data for the specific case of the manufacturing domain. However, the proposed adaptation methodology can be applied to any other do- main by following the same synthetic data generation process.

show abstract

Gaze Assisted Visual Grounding

Cited by 3 publications

References 23 publications

Manufacturing domain instruction comprehension using synthetic data

Manufacturing domain instruction comprehension using synthetic data

GVGNet: Gaze-Directed Visual Grounding for Learning Under-Specified Object Referring Intention

Manufacturing Domain Instruction Comprehension using Synthetic Data

Contact Info

Product

Resources

About