Despite various optical realizations of convolutional neural networks (CNNs), optical implementation of nonlinear activation functions and pooling operations are still challenging problems. In this regard, this paper proposes an optical saturable absorption nonlinearity and its atomic-level model, as well as two various optical pooling operations, namely optical average pooling and optical motion pooling, by means of 4f optical correlators. Proposing these optical building blocks not only speed up the neural networks due to negligible optical processing latency, but also facilitate the concatenation of optical convolutional layers with no optoelectrical conversions in-between, as the significant bottlenecks of implementing photonic CNNs. Furthermore, the proposed optical motion pooling layer increases the translation invariance property of CNNs, avoiding the inclusion of all corresponding translated images for the training procedure, and hence, increases the training speed of the neural network. The classification accuracy of the proposed optical convolutional layer is evaluated as the first layer of a customized version of AlexNet architecture, named as OP-AlexNet, for classification of Kaggle Cats and Dog challenge, CIFAR-10, and MNIST datasets, as 83.76%, 72.82%, and 99.25%, respectively, by using optical average pooling.
The classification performance of all-optical Convolutional Neural Networks (CNNs) is greatly influenced by components’ misalignment and translation of input images in the practical applications. In this paper, we propose a free-space all-optical CNN (named Trans-ONN) which accurately classifies translated images in the horizontal, vertical, or diagonal directions. Trans-ONN takes advantages of an optical motion pooling layer which provides the translation invariance property by implementing different optical masks in the Fourier plane for classifying translated test images. Moreover, to enhance the translation invariance property, global average pooling (GAP) is utilized in the Trans-ONN structure, rather than fully connected layers. The comparative studies confirm that taking advantage of vertical and horizontal masks along GAP operation provide the best translation invariance property, compared to the alternative network models, for classifying horizontally and vertically shifted test images up to 50 pixel shifts of Kaggle Cats and Dogs, CIFAR-10, and MNIST datasets, respectively. Also, adopting the diagonal mask along GAP operation achieves the best classification accuracy for classifying translated test images in the diagonal direction for large number of pixel shifts (i.e. more than 30 pixel shifts). It is worth mentioning that the proposed translation invariant networks are capable of classifying the translated test images not included in the training procedure.
Convolutional neural networks (CNNs) are at the heart of several machine learning applications, while they suffer from computational complexity due to their large number of parameters and operations. Recently, all-optical implementation of the CNNs has achieved many attentions, however, the recently proposed optical architectures for CNNs cannot fully utilize the tremendous capabilities of optical processing, due to the required electro-optical conversions in-between successive layers. To implement an all-optical multi-layer CNN, it is essential to optically implement all required operations, namely convolution, summation of channels' output for each convolutional kernel feeding the nonlinear unit, nonlinear activation function, and finally, pooling operations. Considering the lack of multi-layer photonic CNN implementation, in this paper, we explore a fully-optical design for implementing successive convolutional layers in an optical CNN. As a proof of concept, and without loss of generality, we considered two successive optical layers in the proposed network, named as 2L-OPCNN, for comparative studies against electrical counterpart and single optical layer CNN. Our simulation results confirm nearly the same accuracies for classifying images of Kaggle Cats and Dogs challenge, CIFAR-10, and MNIST datasets, compared to the electrical counterpart, as well as improved accuracies compared to single optical layer CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.