Chuanmin Jia scite author profile

Attacking deep learning based biometric systems has drawn more and more attention with the wide deployment of fingerprint/face/speaker recognition systems, given the fact that the neural networks are vulnerable to the adversarial examples, which have been intentionally perturbed to remain almost imperceptible for human. In this paper, we demonstrated the existence of the universal adversarial perturbations (UAPs) for the speaker recognition systems. We proposed a generative network to learn the mapping from the low-dimensional normal distribution to the UAPs subspace, then synthesize the UAPs to perturbe any input signals to spoof the well-trained speaker recognition model with high probability. Experimental results on TIMIT and LibriSpeech datasets demonstrate the effectiveness of our model.

show abstract

Light Field Image Compression Using Generative Adversarial Network-Based View Synthesis

Jia

Zhang

Wang

et al. 2019

IEEE J. Emerg. Sel. Topics Circuits Syst.

View full text Add to dashboard Cite

Direct Speech-to-Image Translation

Zhang

Jia

et al. 2020

IEEE J. Sel. Top. Signal Process.

View full text Add to dashboard Cite

Direct speech-to-image translation without text is an interesting and useful topic due to the potential applications in human-computer interaction, art creation, computer-aided design. etc. Not to mention that many languages have no writing form. However, as far as we know, it has not been well-studied how to translate the speech signals into images directly and how well they can be translated. In this paper, we attempt to translate the speech signals into the image signals without the transcription stage. Specifically, a speech encoder is designed to represent the input speech signals as an embedding feature, and it is trained with a pretrained image encoder using teacher-student learning to obtain better generalization ability on new classes. Subsequently, a stacked generative adversarial network is used to synthesize high-quality images conditioned on the embedding feature. Experimental results on both synthesized and real data show that our proposed method is effective to translate the raw speech signals into images without the middle text representation. Ablation study gives more insights about our method.

show abstract

Spatial-temporal residue network based in-loop filter for video coding

Jia

Wang

Zhang

et al. 2017

View full text Add to dashboard Cite

Deep learning has demonstrated tremendous break through in the area of image/video processing. In this paper, a spatial-temporal residue network (STResNet) based in-loop filter is proposed to suppress visual artifacts such as blocking, ringing in video coding. Specifically, the spatial and temporal information is jointly exploited by taking both current block and co-located block in reference frame into consideration during the processing of in-loop filter. The architecture of STResNet only consists of four convolution layers which shows hospitality to memory and coding complexity. Moreover, to fully adapt the input content and improve the performance of the proposed in-loop filter, coding tree unit (CTU) level control flag is applied in the sense of ratedistortion optimization. Extensive experimental results show that our scheme provides up to 5.1% bit-rate reduction compared to the state-of-the-art video coding standard.

show abstract

Recent Development of AVS Video Coding Standard: AVS3

et al. 2019

View full text Add to dashboard Cite

Light Field Image Compression Based on Deep Learning

Zhao

Wang

Jia

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chuanmin Jia

Image and Video Compression With Neural Networks: A Review

Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding

Universal Adversarial Perturbations Generative Network For Speaker Recognition

Light Field Image Compression Using Generative Adversarial Network-Based View Synthesis

Direct Speech-to-Image Translation

Spatial-temporal residue network based in-loop filter for video coding

Recent Development of AVS Video Coding Standard: AVS3

Light Field Image Compression Based on Deep Learning

Contact Info

Product

Resources

About