2022
DOI: 10.1038/s41598-022-22291-0
|View full text |Cite
|
Sign up to set email alerts
|

Translation-invariant optical neural network for image classification

Abstract: The classification performance of all-optical Convolutional Neural Networks (CNNs) is greatly influenced by components’ misalignment and translation of input images in the practical applications. In this paper, we propose a free-space all-optical CNN (named Trans-ONN) which accurately classifies translated images in the horizontal, vertical, or diagonal directions. Trans-ONN takes advantages of an optical motion pooling layer which provides the translation invariance property by implementing different optical … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 33 publications
0
4
0
Order By: Relevance
“…To this end, we adapt the VGG‐11 architecture by Simonyan et al . [SZ14], but change the input resolution from 224 to 512 and add a global pooling layer between the last convolutional layer and the first fully connected layer to make the network more agnostic to global translations [LMW*22,SK22]. After training, we can feed real scanned patches to the classifier and directly predict the simulation parameters.…”
Section: Wrinkle Simulation With a Preset Librarymentioning
confidence: 99%
“…To this end, we adapt the VGG‐11 architecture by Simonyan et al . [SZ14], but change the input resolution from 224 to 512 and add a global pooling layer between the last convolutional layer and the first fully connected layer to make the network more agnostic to global translations [LMW*22,SK22]. After training, we can feed real scanned patches to the classifier and directly predict the simulation parameters.…”
Section: Wrinkle Simulation With a Preset Librarymentioning
confidence: 99%
“…For the CNN model, we leveraged the standard ResNet-50 which has 50 layers with incorporated residual connections with no further tuning [43]. For the ViT architecture [25], we used 12 attention layers and a patch size of 16, hidden size of 768, and 12 heads. Following the practices established by [44] and [25], we pretrained the ViT on the ImageNet dataset [45].…”
Section: Vision Transformer (Vit) and Resnet Training And Evaluationmentioning
confidence: 99%
“…For the ViT architecture [25], we used 12 attention layers and a patch size of 16, hidden size of 768, and 12 heads. Following the practices established by [44] and [25], we pretrained the ViT on the ImageNet dataset [45]. The images were resized to a uniform size of 224 × 224 pixels.…”
Section: Vision Transformer (Vit) and Resnet Training And Evaluationmentioning
confidence: 99%
See 1 more Smart Citation