Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

Chowdhury, Pinaki Nath; Bhunia, Ayan Kumar; Gajjala, Viswanatha Reddy; Sain, Aneeshan; Xiang, Tao; Song, Yi-Zhe

doi:10.1109/cvpr52688.2022.00243

Cited by 22 publications

(5 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Figure 2, the fusion segmentation module (Fus-Seg) shows an intuitive synergy [30,31] between different layers in the network. Then, the module generates labels for every point in the point cloud to extend the expression dimensions of the point cloud features.…”

Section: Fusion Segmentation Modulementioning

confidence: 99%

DFSNet: A 3D Point Cloud Segmentation Network toward Trees Detection in an Orchard Scene

Bu,

Liu,

Liu

et al. 2024

Sensors

View full text Add to dashboard Cite

In order to guide orchard management robots to realize some tasks in orchard production such as autonomic navigation and precision spraying, this research proposed a deep-learning network called dynamic fusion segmentation network (DFSNet). The network contains a local feature aggregation (LFA) layer and a dynamic fusion segmentation architecture. The LFA layer uses the positional encoders for initial transforming embedding, and progressively aggregates local patterns via the multi-stage hierarchy. The fusion segmentation module (Fus-Seg) can format point tags by learning a multi-embedding space, and the generated tags can further mine the point cloud features. At the experimental stage, significant segmentation results of the DFSNet were demonstrated on the dataset of orchard fields, achieving an accuracy rate of 89.43% and an mIoU rate of 74.05%. DFSNet outperforms other semantic segmentation networks, such as PointNet, PointNet++, D-PointNet++, DGCNN, and Point-NN, with improved accuracies over them by 11.73%, 3.76%, 2.36%, and 2.74%, respectively, and improved mIoUs over the these networks by 28.19%, 9.89%, 6.33%, 9.89, and 24.69%, respectively, on the all-scale dataset (simple-scale dataset + complex-scale dataset). The proposed DFSNet can capture more information from orchard scene point clouds and provide more accurate point cloud segmentation results, which are beneficial to the management of orchards.

show abstract

Section: Fusion Segmentation Modulementioning

confidence: 99%

DFSNet: A 3D Point Cloud Segmentation Network toward Trees Detection in an Orchard Scene

Bu,

Liu,

Liu

et al. 2024

Sensors

View full text Add to dashboard Cite

show abstract

“…Detailed sketch and text input have been used to (a) retrieve e-commerce product images using CNNs and LSTMs (Song et al 2017a), and (b) retrieve scene images using CLIP (Sangkloy et al 2022;Chowdhury et al 2023a). However, in several practical scenarios, (a) the sketch is object-level, very rough, and not elaborate, and (b) the text is partial (complementary to sketch) and not self-contained.…”

Section: Related Workmentioning

confidence: 99%

“…The Thirty-Eighth AAAI Conference on Artificial Intelligence Although a vast literature exists on TBIR and SBIR, to the best of our knowledge, the CSTBIR problem setting has yet to be studied rigorously. There have been some recent works (Song et al 2017a;Sangkloy et al 2022;Chowdhury et al 2023a) that attempt to solve a simpler version, where: (a) target image collection is focused objects rather than complex natural scenes, (b) sketch is at scene-level rather than object-level, or (c) text description is comprehensive rather than partial (or complementary). This paper proposes a system for the complex CSTBIR setting.…”

Section: Introductionmentioning

confidence: 99%

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions

Gatti,

Parikh,

Paul

et al. 2024

AAAI

View full text Add to dashboard Cite

Non-native speakers with limited vocabulary often struggle to name specific objects despite being able to visualize them, e.g., people outside Australia searching for ‘numbats.’ Further, users may want to search for such elusive objects with difficult-to-sketch interactions, e.g., “numbat digging in the ground.” In such common but complex situations, users desire a search interface that accepts composite multimodal queries comprising hand-drawn sketches of “difficult-to-name but easy-to-draw” objects and text describing “difficult-to-sketch but easy-to-verbalize” object's attributes or interaction with the scene. This novel problem statement distinctly differs from the previously well-researched TBIR (text-based image retrieval) and SBIR (sketch-based image retrieval) problems. To study this under-explored task, we curate a dataset, CSTBIR (Composite Sketch+Text Based Image Retrieval), consisting of ~2M queries and 108K natural scene images. Further, as a solution to this problem, we propose a pretrained multimodal transformer-based baseline, STNet (Sketch+Text Network), that uses a hand-drawn sketch to localize relevant objects in the natural scene image, and encodes the text and image to perform image retrieval. In addition to contrastive learning, we propose multiple training objectives that improve the performance of our model. Extensive experiments show that our proposed method outperforms several state-of-the-art retrieval methods for text-only, sketch-only, and composite query modalities. We make the dataset and code available at: https://vl2g.github.io/projects/cstbir.

show abstract

“…(5) Fine-Grained Discriminative Loss: While reconstruction loss aims to align the pixel values between generated and ground-truth photo, the discriminative sketchphoto (paired) association compared to other photos needs to be modelled further to reflect the fine-grained user intent of input sketch in the output space. Triplet with cosinedistance based pre-trained fine-grained SBIR [19] model F g (•) places a sketch nearer to its paired photo compared to others in a joint-embedding space. Therefore, we compute a discriminative fine-grained loss that measures the cosine similarity between s and r as:…”

Section: Training Proceduresmentioning

confidence: 99%

Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

Koley¹,

Bhunia²,

Sain³

et al. 2023

Preprint

View full text Add to dashboard Cite

Edgemap Sketch Sketch Sketch Sketch Sketch Existing Methods Proposed Method Figure 1. (a) Set of photos generated by the proposed method. (b) While existing methods can generate faithful photos from perfectly pixel-aligned edgemaps, they fall short drastically in case of highly deformed and sparse free-hand sketches. In contrast, our autoregressive sketch-to-photo generation model produces highly photorealistic outputs from highly abstract sketches.

show abstract

Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

Cited by 22 publications

References 59 publications

DFSNet: A 3D Point Cloud Segmentation Network toward Trees Detection in an Orchard Scene

DFSNet: A 3D Point Cloud Segmentation Network toward Trees Detection in an Orchard Scene

Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions

Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

Contact Info

Product

Resources

About