Aisha Urooj scite author profile

Aisha Urooj

2Publications

25Citation Statements Received

83Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Segmenting Sky Pixels in Images: Analysis and Comparison

Place

Urooj²,

Borji³

2019

View full text Add to dashboard Cite

Outdoor scene parsing models are often trained on ideal datasets and produce quality results. However, this leads to a discrepancy when applied to the real world. The quality of scene parsing, particularly sky classification, decreases in night time images, images involving varying weather conditions, and scene changes due to seasonal weather. This project focuses on approaching these challenges by using a state-of-the-art model in conjunction with non-ideal datasets: SkyFinder and a subset from the SUN database containing the Sky object. We focus specifically on sky segmentation, the task of determining sky and not-sky pixels, and improving upon an existing state-of-the-art model: Re-fineNet. As a result of our efforts, we have seen an improvement of 10-15% in the average MCR compared to the prior methods on the SkyFinder dataset. We have also improved from an off-the-shelf model in terms of average mIOU by nearly 35%. Further, we analyze our trained models on images w.r.t two aspects: times of day and weather, and find that in spite of facing the same challenges as prior methods, our trained models significantly outperform them.

show abstract

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

Urooj¹,

Mazaheri²,

Lobo³

et al. 2020

View full text Add to dashboard Cite

We present MMFT-BERT (MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities. Our approach benefits from processing multimodal data (video and text) adopting the BERT encodings individually and using a novel transformerbased fusion method to fuse them together. Our method decomposes the different sources of modalities, into different BERT instances with similar architectures, but variable weights. This achieves SOTA results on the TVQA dataset. Additionally, we provide TVQA-Visual, an isolated diagnostic subset of TVQA, which strictly requires the knowledge of visual (V) modality based on a human annotator's judgment. This set of questions helps us to study the model's behavior and the challenges TVQA poses to prevent the achievement of super human performance. Extensive experiments show the effectiveness and superiority of our method 1 .

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aisha Urooj

Segmenting Sky Pixels in Images: Analysis and Comparison

MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering

Contact Info

Product

Resources

About