B. Natarajan scite author profile

The recent developments in deep learning techniques evolved to new heights in various domains and applications. The recognition, translation, and video generation of Sign Language (SL) still face huge challenges from the development perspective. Although numerous advancements have been made in earlier approaches, the model performance still lacks recognition accuracy and visual quality. In this paper, we introduce novel approaches for developing the complete framework for handling SL recognition, translation, and production tasks in real-time cases. To achieve higher recognition accuracy, we use the MediaPipe library and a hybrid Convolutional Neural Network + Bi-directional Long Short Term Memory (CNN + Bi-LSTM) model for pose details extraction and text generation. On the other hand, the production of sign gesture videos for given spoken sentences is implemented using a hybrid Neural Machine Translation (NMT) + MediaPipe + Dynamic Generative Adversarial Network (GAN) model. The proposed model addresses the various complexities present in the existing approaches and achieves above 95% classification accuracy. In addition to that, the model performance is tested in various phases of development, and the evaluation metrics show noticeable improvements in our model. The model has been experimented with using different multilingual benchmark sign corpus and produces greater results in terms of recognition accuracy and visual quality. The proposed model has secured a 38.06 average Bilingual Evaluation Understudy (BLEU) score, remarkable human evaluation scores, 3.46 average Fréchet Inception Distance to videos (FID2vid) score, 0.921 average Structural Similarity Index Measure (SSIM) values, 8.4 average Inception Score, 29.73 average Peak Signal-to-Noise Ratio (PSNR) score, 14.06 average Fréchet Inception Distance (FID) score, and an average 0.715 Temporal Consistency Metric (TCM) Score which is evidence of the proposed work.

show abstract

MIOpen: An Open Source Library For Deep Learning Primitives

Khan¹,

Fultz²,

Tamazov³

et al. 2020

View full text Add to dashboard Cite

Deep Learning has established itself to be a common occurrence in the business lexicon. The unprecedented success of deep learning in recent years can be attributed to: an abundance of data, availability of gargantuan compute capabilities offered by GPUs, and adoption of open-source philosophy by the researchers and industry. Deep neural networks can be decomposed into a series of different operators. MIOpen, AMD's open-source deep learning primitives library for GPUs, provides highly optimized implementations of such operators, shielding researchers from internal implementation details and hence, accelerating the time to discovery. This paper introduces MIOpen and provides details about the internal workings of the library and supported features. MIOpen innovates on several fronts, such as implementing fusion to optimize for memory bandwidth and GPU launch overheads, providing an auto-tuning infrastructure to overcome the large design space of problem configurations, and implementing different algorithms to optimize convolutions for different filter and input sizes. MIOpen is one of the first libraries to publicly support the bfloat16 data-type for convolutions, allowing efficient training at lower precision without the loss of accuracy.

show abstract

Dynamic GAN for high-quality sign language video generation from skeletal poses using generative adversarial networks

Natarajan

Elakkiya

2022

Soft Comput

View full text Add to dashboard Cite

The emergence of unsupervised generative models has resulted in greater performance in image and video generation tasks. However, existing generative models pose huge challenges in high-quality video generation process due to blurry and inconsistent results. In this paper, we introduce a novel generative framework named Dynamic Generative Adversarial Networks (Dynamic GAN) model for regulating the adversarial training and generating photorealistic high-quality sign language videos from skeletal poses. The proposed model comprises three stages of development such as generator network, classification and image quality enhancement and discriminator network. In the generator fold, the model generates samples similar to real images using random noise vectors, the classification of generated samples are carried out using the VGG-19 model and novel techniques are employed for improving the quality of generated samples in the second fold of the model and finally the discriminator networks fold identifies the real or fake samples. Unlike, existing approaches the proposed novel framework produces photo-realistic video quality results without using any animation or avatar approaches. To evaluate the model performance qualitatively and quantitatively, the proposed model has been evaluated using three benchmark datasets that yield plausible results. The datasets are RWTH-PHOENIX-Weather 2014T dataset, and our self-created dataset for Indian Sign Language (ISL-CSLTR), and the UCF-101 Action Recognition dataset. The output samples and performance metrics show the outstanding performance of our model.

show abstract

MIOpen: An Open Source Library For Deep Learning Primitives

Khan¹,

Fultz²,

Tamazov³

et al. 2019

Preprint

View full text Add to dashboard Cite

Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation

Natarajan

Elakkiya

Prasad

2022

J Ambient Intell Human Comput

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

B. Natarajan

Development of an End-to-End Deep Learning Framework for Sign Language Recognition, Translation, and Video Generation

MIOpen: An Open Source Library For Deep Learning Primitives

Dynamic GAN for high-quality sign language video generation from skeletal poses using generative adversarial networks

MIOpen: An Open Source Library For Deep Learning Primitives

Sentence2SignGesture: a hybrid neural machine translation network for sign language video generation

Contact Info

Product

Resources

About