FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

Chowdhury, Pinaki Nath; Sain, Aneeshan; Bhunia, Ayan Kumar; Xiang, Tao; Gryaditskaya, Yulia; Song, Yi-Zhe

doi:10.1007/978-3-031-20074-8_15

Cited by 27 publications

(9 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Specifically, sketch-based video summarization aims to automatically generate storyboard sketches from video clips, which provides an interactive representation to annotate and visualize the major scene content of video clips [42] and supports flexibly editing or adding object sketches in a sketch-based interface. Furthermore, we will try CLIP [43] to simplify our SQ-GCN model inspired by [44], [32] and adapt it to scene sketches' feature encoding in the future.…”

Section: Discussionmentioning

confidence: 99%

“…Although TU-Berlin [9] and Sketchy [10] have a large amount of sketches and object categories, they cannot enable fine-grained instance-level retrieval due to the lack of instance-level matches. Most of the remaining datasets in Table I support the fine-grained cross-modal retrieval task, among which SketchyScene [12], SketchyCOCO [31] and FS-COCO [32] are capable of finegrained scene-level retrieval with multiple instances, yet they are all limited to the image domain. Compared with the video retrieval datasets TSF [3] and FG-SBVR [8], our dataset covers more object categories and contains more sketches, and the sketches in our dataset depict not only fine-grained single instances but also multiple objects in diverse scenes, which is more suitable for real-world sketch-related video research.…”

Section: E Dataset Analysismentioning

confidence: 99%

See 1 more Smart Citation

Fine-Grained Video Retrieval With Scene Sketches

Zuo

Deng

Chen

et al. 2023

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Section: Discussionmentioning

confidence: 99%

Section: E Dataset Analysismentioning

confidence: 99%

Fine-Grained Video Retrieval With Scene Sketches

Zuo

Deng

Chen

et al. 2023

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

“…One trend that has emerged in the field is the use of attention mechanisms [ 36 , 37 , 38 , 39 , 40 ], which explored attention to effectively incorporate both global and local visual features in image captioning. Another trend in the field of image captioning focused on fine-grained details and object descriptions [ 41 , 42 , 43 ]. In recent studies, transformer models have also proven to be effective in several recent studies.…”

Section: Related Workmentioning

confidence: 99%

Multi-Modal Fake News Detection via Bridging the Gap between Modals

Liu

Qian

et al. 2023

Entropy

View full text Add to dashboard Cite

Multi-modal fake news detection aims to identify fake information through text and corresponding images. The current methods purely combine images and text scenarios by a vanilla attention module but there exists a semantic gap between different scenarios. To address this issue, we introduce an image caption-based method to enhance the model’s ability to capture semantic information from images. Formally, we integrate image description information into the text to bridge the semantic gap between text and images. Moreover, to optimize image utilization and enhance the semantic interaction between images and text, we combine global and object features from the images for the final representation. Finally, we leverage a transformer to fuse the above multi-modal content. We carried out extensive experiments on two publicly available datasets, and the results show that our proposed method significantly improves performance compared to other existing methods.

show abstract

“…Nonetheless, the majority of these works [15,53] are restricted to using edgemaps as a pseudo sketch-replacement for model training. However, a free-hand sketch [62] with human-drawn sparse and abstract strokes, is a way of conveying the "semantic intent", and largely differs [58] from an edgemap. While edgemap perfectly aligns with photo boundaries, a sketch is a human abstraction of any object/concept, usually with strong deformations [58].…”

Section: Sketch-to-photo Generationmentioning

confidence: 99%

Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

Koley¹,

Bhunia²,

Sain³

et al. 2023

Preprint

View full text Add to dashboard Cite

Edgemap Sketch Sketch Sketch Sketch Sketch Existing Methods Proposed Method Figure 1. (a) Set of photos generated by the proposed method. (b) While existing methods can generate faithful photos from perfectly pixel-aligned edgemaps, they fall short drastically in case of highly deformed and sparse free-hand sketches. In contrast, our autoregressive sketch-to-photo generation model produces highly photorealistic outputs from highly abstract sketches.

show abstract

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

Cited by 27 publications

References 44 publications

Fine-Grained Video Retrieval With Scene Sketches

Fine-Grained Video Retrieval With Scene Sketches

Multi-Modal Fake News Detection via Bridging the Gap between Modals

Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

Contact Info

Product

Resources

About