Hasan Firdaus Mohd Zaki scite author profile

Hasan Firdaus Mohd Zaki

Sign up to set email alerts

|

5Publications

112Citation Statements Received

118Citation Statements Given

How they've been cited

How they cite others

Affiliations

International Islamic University Malaysia, University of Western Australia, Southwestern Medical Center

Publications

Order By: Most citations

Benchmark Data Set and Method for Depth Estimation From Light Field Images

¹

,

²

,

³

et al. 2018

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Convolutional Neural Networks (CNN) have performed extremely well for many image analysis tasks. However, supervised training of deep CNN architectures requires huge amounts of labelled data which is unavailable for light field images. In this paper, we leverage on synthetic light field images and propose a two stream CNN network that learns to estimate the disparities of multiple correlated neighbourhood pixels from their Epipolar Plane Images (EPI). Since the EPIs are unrelated except at their intersection, a two stream network is proposed to learn convolution weights individually for the EPIs and then combine the outputs of the two streams for disparity estimation. The CNN estimated disparity map is then refined using the central RGB light field image as a prior in a variational technique. We also propose a new real world dataset comprising light field images of 19 objects captured with the Lytro Illum camera in outdoor scenes and their corresponding 3D pointclouds, as ground truth, captured with the 3dMD scanner. This dataset will be made public to allow more precise 3D pointcloud level comparison of algorithms in the future which is currently not possible. Experiments on the synthetic and real world datasets show that our algorithm outperforms existing state-of-the-art for depth estimation from light field images.

Convolutional hypercube pyramid for accurate RGB-D object category and instance recognition

¹

,

²

2016

View full text Add to dashboard Cite

Viewpoint invariant semantic object and scene categorization with RGB-D sensors

¹

,

²

2018

View full text Add to dashboard Cite

Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional References (53) Engineering main heading: Deep learning Funding details Funding sponsor Funding number Acronym Australian Research Council Division of Arctic Sciences DP160101458

Modeling 2D Appearance Evolution for 3D Object Categorization

¹

,

²

2016

View full text Add to dashboard Cite

Modeling Sub-Event Dynamics in First-Person Action Recognition

¹

,

²

2017

View full text Add to dashboard Cite

First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Copyright © 2024 scite LLC. All rights reserved.

Made with 💙 for researchers

Part of the Research Solutions Family.