Mausoom Sarkar scite author profile

Mausoom Sarkar

5Publications

49Citation Statements Received

53Citation Statements Given

How they've been cited

How they cite others

Affiliations

Adobe Systems (United States), Indian Institute of Technology Kanpur, Southern California University for Professional Studies

Publications

Order By: Most citations

Form2Seq : A Framework for Higher-Order Form Structure Extraction

Aggarwal¹,

Gupta²,

Sarkar³

et al. 2020

View full text Add to dashboard Cite

Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fullyconvolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequenceto-sequence (Seq2Seq) inspired framework for structure extraction using text, with a specific focus on forms, which leverages relative spatial arrangement of structures. We discuss two tasks; 1) Classification of low-level constituent elements (TextBlock and empty fillable Widget) into ten types such as field captions, list items, and others; 2) Grouping lower-level elements into higher-order constructs, such as Text Fields, ChoiceFields and ChoiceGroups, used as information collection mechanism in forms. To achieve this, we arrange the constituent elements linearly in natural reading order, feed their spatial and textual representations to Seq2Seq framework, which sequentially outputs prediction of each element depending on the final task. We modify Seq2Seq for grouping task and discuss improvements obtained through cascaded end-to-end training of two tasks versus training in isolation. Experimental results show the effectiveness of our text-based approach achieving an accuracy of 90% on classification task and an F1 of 75.82, 86.01, 61.63 on groups discussed above respectively, outperforming segmentation baselines. Further we show our framework achieves state of the results for table structure recognition on ICDAR 2013 dataset.

show abstract

Powering Robust Fashion Retrieval With Information Rich Feature Embeddings

Chopra

Sinha

Gupta

et al. 2019

View full text Add to dashboard Cite

Document Structure Extraction Using Prior Based High Resolution Hierarchical Semantic Segmentation

Sarkar

Aggarwal

Jain

et al. 2020

View full text Add to dashboard Cite

Attention Based Natural Language Grounding by Navigating Virtual Environment

Sinha

Akilesh

Sarkar

et al. 2019

View full text Add to dashboard Cite

In this work, we focus on the problem of grounding language by training an agent to follow a set of natural language instructions and navigate to a target object in an environment. The agent receives visual information through raw pixels and a natural language instruction telling what task needs to be achieved and is trained in an end-to-end way. We develop an attention mechanism for multi-modal fusion of visual and textual modalities that allows the agent to learn to complete the task and achieve language grounding. Our experimental results show that our attention mechanism outperforms the existing multi-modal fusion mechanisms proposed for both 2D and 3D environments in order to solve the above-mentioned task in terms of both speed and success rate. We show that the learnt textual representations are semantically meaningful as they follow vector arithmetic in the embedding space. The effectiveness of our attention approach over the contemporary fusion mechanisms is also highlighted from the textual embeddings learnt by the different approaches. We also show that our model generalizes effectively to unseen scenarios and exhibit zero-shot generalization capabilities both in 2D and 3D environments. The code for our 2D environment as well as the models that we developed for both 2D and 3D are available at https://github.com/rl-lang-grounding/rl-lang-ground.

show abstract

Leveraging Style and Content features for Text Conditioned Image Retrieval

Chawla

Jandial

Badjatiya

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mausoom Sarkar

Form2Seq : A Framework for Higher-Order Form Structure Extraction

Powering Robust Fashion Retrieval With Information Rich Feature Embeddings

Document Structure Extraction Using Prior Based High Resolution Hierarchical Semantic Segmentation

Attention Based Natural Language Grounding by Navigating Virtual Environment

Leveraging Style and Content features for Text Conditioned Image Retrieval

Contact Info

Product

Resources

About