Ekraam Sabir scite author profile

Nefarious actors on social media and other platforms often spread rumors and falsehoods through images whose metadata (e.g., captions) have been modified to provide visual substantiation of the rumor/falsehood. This type of modification is referred to as image repurposing, in which often an unmanipulated image is published along with incorrect or manipulated metadata to serve the actor's ulterior motives. We present the Multimodal Entity Image Repurposing (MEIR) dataset, a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr. We also present a novel, end-to-end, deep multimodal learning model for assessing the integrity of an image by combining information extracted from the image with related information from a knowledge base. The proposed method is compared against state-of-the-art techniques on existing datasets as well as MEIR, where it outperforms existing methods across the board, with AUC improvement up to 0.23.

show abstract

Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text

Jaiswal¹,

Sabir²,

AbdAlmageed³

et al. 2017

View full text Add to dashboard Cite

Real world multimedia data is o en composed of multiple modalities such as an image or a video with associated text (e.g. captions, user comments, etc.) and metadata. Such multimodal data packages are prone to manipulations, where a subset of these modalities can be altered to misrepresent or repurpose data packages, with possible malicious intent. It is, therefore, important to develop methods to assess or verify the integrity of these multimedia packages. Using computer vision and natural language processing methods to directly compare the image (or video) and the associated caption to verify the integrity of a media package is only possible for a limited set of objects and scenes. In this paper, we present a novel deep learning-based approach for assessing the semantic integrity of multimedia packages containing images and captions, using a reference set of multimedia packages. We construct a joint embedding of images and captions with deep multimodal representation learning on the reference dataset in a framework that also provides image-caption consistency scores (ICCSs). e integrity of query media packages is assessed as the inlierness of the query ICCSs with respect to the reference dataset. We present the MultimodAl Information Manipulation dataset (MAIM), a new dataset of media packages from Flickr, which we make available to the research community. We use both the newly created dataset as well as Flickr30K and MS COCO datasets to quantitatively evaluate our proposed approach. e reference dataset does not contain unmanipulated versions of tampered query packages. Our method is able to achieve F 1 scores of 0.75, 0.89 and 0.94 on MAIM, Flickr30K and MS COCO, respectively, for detecting semantically incoherent media packages. ACM Reference format:

show abstract

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

Kartik

Sabir

Mitra³

et al. 2018

View full text Add to dashboard Cite

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

show abstract

Implicit Language Model in LSTM for OCR

Sabir

Rawls

Natarajan

2017

View full text Add to dashboard Cite

Neural networks have become the technique of choice for OCR, but many aspects of how and why they deliver superior performance are still unknown. One key difference between current neural network techniques using LSTMs and the previous state-of-the-art HMM systems is that HMM systems have a strong independence assumption. In comparison LSTMs have no explicit constraints on the amount of context that can be considered during decoding. In this paper we show that they learn an implicit LM and attempt to characterize the strength of the LM in terms of equivalent n-gram context. We show that this implicitly learned language model provides a 2.4% CER improvement on our synthetic test set when compared against a test set of random characters (i.e. not naturally occurring sequences), and that the LSTM learns to use up to 5 characters of context (which is roughly 88 frames in our configuration). We believe that this is the first ever attempt at characterizing the strength of the implicit LM in LSTM based OCR systems.

show abstract

BioFors: A Large Biomedical Image Forensics Dataset

Sabir¹,

Nandi²,

AbdAlmageed³

et al. 2021

View full text Add to dashboard Cite

Combining deep learning and language modeling for segmentation-free OCR from raw pixels

Rawls

Cao

Sabir

et al. 2017

View full text Add to dashboard Cite

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

Sabir¹,

Jaiswal²,

AbdAlmageed³

et al. 2021

View full text Add to dashboard Cite

CORD19STS: COVID-19 Semantic Textual Similarity Dataset

Guo¹,

Mirzaalian²,

Sabir³

et al. 2020

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ekraam Sabir

Deep Multimodal Image-Repurposing Detection

Multimedia Semantic Integrity Assessment Using Joint Embedding Of Images And Text

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

Implicit Language Model in LSTM for OCR

BioFors: A Large Biomedical Image Forensics Dataset

Combining deep learning and language modeling for segmentation-free OCR from raw pixels

MEG: Multi-Evidence GNN for Multimodal Semantic Forensics

CORD19STS: COVID-19 Semantic Textual Similarity Dataset

Contact Info

Product

Resources

About