Muhammad Bilal Shaikh scite author profile

Muhammad Bilal Shaikh

5Publications

18Citation Statements Received

428Citation Statements Given

How they've been cited

How they cite others

372

428

Affiliations

Edith Cowan University

Publications

Order By: Most citations

RGB-D Data-Based Action Recognition: A Review

Shaikh

Chai

2021

Sensors

View full text Add to dashboard Cite

Classification of human actions is an ongoing research problem in computer vision. This review is aimed to scope current literature on data fusion and action recognition techniques and to identify gaps and future research direction. Success in producing cost-effective and portable vision-based sensors has dramatically increased the number and size of datasets. The increase in the number of action recognition datasets intersects with advances in deep learning architectures and computational support, both of which offer significant research opportunities. Naturally, each action-data modality—such as RGB, depth, skeleton, and infrared (IR)—has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition. In this paper, we focus solely on data fusion and recognition techniques in the context of vision with an RGB-D perspective. We conclude by discussing research challenges, emerging trends, and possible future research directions.

show abstract

RGB-D Data-based Action Recognition: A Review

Shaikh¹,

Chai²

2021

Preprint

View full text Add to dashboard Cite

Classification of human actions from uni-modal and multi-modal datasets is an ongoing research problem in computer vision. This review is aimed to scope current literature on data-fusion and action-recognition techniques and to identify gaps and future research direction. Success in producing cost-effective and portable vision-based sensors has dramatically increased the number and size of datasets. The rise in number of action recognition datasets intersects with advances in deep-learning architectures and computational support, both of which offer significant research opportunities. Naturally, each action-data modality - such as RGB, depth, skeleton, and infrared - has distinct characteristics; therefore, it is important to exploit the value of each modality for better action recognition. In this article we will focus solely on areas such as data fusion and recognition techniques in the context of vision with a uni-modal and multi-modal perspective. We conclude by discussing research challenges, emerging trends, and possible future research directions.

show abstract

Multimodal Fusion for Audio-Image and Video Action Recognition

et al. 2023

View full text Add to dashboard Cite

MAiVAR: Multimodal Audio-Image and Video Action Recognizer

Shaikh

Chai

Islam

et al. 2022

View full text Add to dashboard Cite

In line with the human capacity to perceive the world by simultaneously processing and integrating highdimensional inputs from multiple modalities like vision and audio, we propose a novel model, MAiVAR-T (Multimodal Audio-Image to Video Action Recognition Transformer). This model employs an intuitive approach for the combination of audio-image and video modalities, with a primary aim to escalate the effectiveness of multimodal human action recognition (MHAR). At the core of MAiVAR-T lies the significance of distilling substantial representations from the audio modality and transmuting these into the image domain. Subsequently, this audio-image depiction is fused with the video modality to formulate a unified representation. This concerted approach strives to exploit the contextual richness inherent in both audio and video modalities, thereby promoting action recognition. In contrast to existing state-of-the-art strategies that focus solely on audio or video modalities, MAiVAR-T demonstrates superior performance. Our extensive empirical evaluations conducted on a benchmark action recognition dataset corroborate the model's remarkable performance. This underscores the potential enhancements derived from integrating audio and video modalities for action recognition purposes.

show abstract

PyMAiVAR: An open-source Python suite for audio-image representation in human action recognition

Shaikh¹,

Chai²,

Islam³

et al. 2023

Software Impacts

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.