Anjan Dutta scite author profile

In this paper, we investigate the problem of zeroshot sketch-based image retrieval (ZS-SBIR), where human sketches are used as queries to conduct retrieval of photos from unseen categories. We importantly advance prior arts by proposing a novel ZS-SBIR scenario that represents a firm step forward in its practical application. The new setting uniquely recognizes two important yet often neglected challenges of practical ZS-SBIR, (i) the large domain gap between amateur sketch and photo, and (ii) the necessity for moving towards large-scale retrieval. We first contribute to the community a novel ZS-SBIR dataset, QuickDraw-Extended, that consists of 330, 000 sketches and 204, 000 photos spanning across 110 categories. Highly abstract amateur human sketches are purposefully sourced to maximize the domain gap, instead of ones included in existing datasets that can often be semi-photorealistic. We then formulate a ZS-SBIR framework to jointly model sketches and photos into a common embedding space. A novel strategy to mine the mutual information among domains is specifically engineered to alleviate the domain gap. External semantic knowledge is further embedded to aid semantic transfer. We show that, rather surprisingly, retrieval performance significantly outperforms that of state-of-the-art on existing datasets that can already be achieved using a reduced version of our model. We further demonstrate the superior performance of our full model by comparing with a number of alternatives on the newly proposed dataset. The new dataset, plus all training and testing code of our model, will be publicly released to facilitate future research † .

show abstract

Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval

Dutta

Akata

2019

126

115

View full text Add to dashboard Cite

Zero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection autoencoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-ofthe-art on the challenging Sketchy and TU-Berlin datasets.

show abstract

CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal

Fornés

Dutta

Gordo

et al. 2011

IJDAR

View full text Add to dashboard Cite

Table Detection in Invoice Documents by Graph Neural Networks

Riba

Dutta

Goldmann³

et al. 2019

View full text Add to dashboard Cite

Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical or quantitative relationships among pieces of information. In digital mail room applications, where a large amount of administrative documents must be processed with reasonable accuracy, the detection and interpretation of tables is crucial. Table recognition has gained interest in document image analysis, in particular in unconstrained formats (absence of rule lines, unknown information of rows and columns). In this work, we propose a graph-based approach for detecting tables in document images. Instead of using the raw content (recognized text), we make use of the location, context and content type, thus it is purely a structure perception approach, not dependent on the language and the quality of the text reading. Our framework makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model has been experimentally validated in two invoice datasets and achieved encouraging results. Additionally, due to the scarcity of benchmark datasets for this task, we have contributed to the community a novel dataset derived from the RVL-CDIP invoice data. It will be publicly released to facilitate future research.

show abstract

Compact correlated features for writer independent signature verification

Dutta

Pal

Lladós

2016

View full text Add to dashboard Cite

The ICDAR 2011 Music Scores Competition: Staff Removal and Writer Identification

Fornés

Dutta

Gordo

et al. 2011

View full text Add to dashboard Cite

In the last years, there has been a growing interest in the analysis of handwritten music scores. In this sense, our goal has been to foster the interest in the analysis of handwritten music scores by the proposal of two different competitions: Staff removal and Writer Identification. Both competitions have been tested on the CVC-MUSCIMA database: a groundtruth of handwritten music score images. This paper describes the competition details, including the dataset and groundtruth, the evaluation metrics, and a short description of the participants, their methods, and the obtained results.

show abstract

Hierarchical stochastic graphlet embedding for graph-based pattern recognition

Dutta

Riba

Lladós

et al. 2019

Neural Comput & Applic

View full text Add to dashboard Cite

Despite being very successful within the pattern recognition and machine learning community, graph-based methods are often unusable with many machine learning tools. This is because of the incompatibility of most of the mathematical operations in graph domain. Graph embedding has been proposed as a way to tackle these difficulties, which maps graphs to a vector space and makes the standard machine learning techniques applicable for them. However, it is well known that graph embedding techniques usually suffer from the loss of structural information. In this paper, given a graph, we consider its hierarchical structure for mapping it into a vector space. The hierarchical structure is constructed by topologically clustering the graph nodes, and considering each cluster as a node in the upper hierarchical level. Once this hierarchical structure of graph is constructed, we consider its various configurations of its parts, and use stochastic graphlet embedding (SGE) for mapping them into vector space. Broadly speaking, SGE produces a distribution of uniformly sampled low to high order graphlets as a way to embed graphs into the vector space.In what follows, the coarse-to-fine structure of a graph hierarchy and the statistics fetched through the distribution of low to high order stochastic graphlets complements each other and include important structural information with varied contexts. Altogether, these two techniques substantially cope with the usual information loss involved in graph embedding techniques, and it is not a surprise that we obtain more robust vector space embedding of graphs. This fact has been corroborated through a detailed experimental evaluation on various benchmark graph datasets, where we outperform the state-of-the-art methods.

show abstract

Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing

Shivakumara

Dutta

Tan

et al. 2013

Multimed Tools Appl

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Anjan Dutta

Doodle to Search: Practical Zero-Shot Sketch-Based Image Retrieval

Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-Based Image Retrieval

CVC-MUSCIMA: a ground truth of handwritten music score images for writer identification and staff removal

Table Detection in Invoice Documents by Graph Neural Networks

Compact correlated features for writer independent signature verification

The ICDAR 2011 Music Scores Competition: Staff Removal and Writer Identification

Hierarchical stochastic graphlet embedding for graph-based pattern recognition

Multi-oriented scene text detection in video based on wavelet and angle projection boundary growing

Contact Info

Product

Resources

About