For the benefit of designing scalable, fault resistant optical neural networks (ONNs), we investigate the effects architectural designs have on the ONNs' robustness to imprecise components. We train two ONNs -one with a more tunable design (GridNet) and one with better fault tolerance (FFTNet) -to classify handwritten digits. When simulated without any imperfections, GridNet yields a better accuracy (∼ 98%) than FFTNet (∼ 95%). However, under a small amount of error in their photonic components, the more fault tolerant FFTNet overtakes GridNet. We further provide thorough quantitative and qualitative analyses of ONNs' sensitivity to varying levels and types of imprecisions. Our results offer guidelines for the principled design of fault-tolerant ONNs as well as a foundation for further research.Second, rather than optimization towards a specific matrix, the linear operations learned for the classification task is not, a priori, known. As such, our primary figure of merit is the classification accuracy instead of the fidelity between the target unitary matrix and the one learned.Lastly, the aforementioned studies mainly focused on the optimization of the networks after fabrication. The imprecisions introduced generally reduced the expressivity of the network -how well the network can represent arbitrary transformations. Evaluation of this reduction in tunability and mitigating strategies were provided. However, such post-fabrication optimization requires the characterization of every MZI, the number of which scales with the dimension (N ) of the network as N 2 . Protocols for self configuration of imprecise photonic networks have been demonstrated [17,18]. While measurement of MZIs were not necessary in such protocols, each MZI needed to be configured progressively and sequentially. Thus, the same N 2 scaling problem remained. Furthermore, if multiple ONN devices are fabricated, each device, with unique imperfections, has to be optimized separately. The total computational power required, therefore, scales with the number of devices produced.In contrast, we consider the effects of imprecisions introduced after software training of ONNs (Code 1, Ref.[19]), details of which we present in Sec. 3. This pre-fabrication training is more scalable, both in network size and fabrication volume. An ideal ONN (i.e., one with no imprecisions) is trained in software only once and the parameters are transferred to multiple fabricated instances of the network with imprecise components. No subsequent characterization or tuning of devices are necessary. In addition to the benefit of better scalability, fabrication of static MZIs can be made more precise and cost effective compared to re-configurable ones.We evaluate the degradation of ONNs from their ideal performances with increasing imprecision. To understand how such effects can be minimized, we investigate the role that the architectural designs have on ONNs' sensitivity to imprecisions. The results are presented in Sec. 4.1. Specifically, we study the performance of two ONNs i...
Model architectures have been dramatically increasing in size, improving performance at the cost of resource requirements. In this paper we propose 3DQ, a ternary quantization method, applied for the first time to 3D Fully Convolutional Neural Networks (F-CNNs), enabling 16x model compression while maintaining performance on par with full precision models. We extensively evaluate 3DQ on two datasets for the challenging task of whole brain segmentation. Additionally, we showcase our method's ability to generalize on two common 3D architectures, namely 3D U-Net and V-Net. Outperforming a variety of baselines, the proposed method is capable of compressing large 3D models to a few MBytes, alleviating the storage needs in space-critical applications.
We describe a stochastic, dynamical system capable of inference and learning in a probabilistic latent variable model. The most challenging problem in such models—sampling the posterior distribution over latent variables—is proposed to be solved by harnessing natural sources of stochasticity inherent in electronic and neural systems. We demonstrate this idea for a sparse coding model by deriving a continuous-time equation for inferring its latent variables via Langevin dynamics. The model parameters are learned by simultaneously evolving according to another continuous-time equation, thus bypassing the need for digital accumulators or a global clock. Moreover, we show that Langevin dynamics lead to an efficient procedure for sampling from the posterior distribution in the L0 sparse regime, where latent variables are encouraged to be set to zero as opposed to having a small L1 norm. This allows the model to properly incorporate the notion of sparsity rather than having to resort to a relaxed version of sparsity to make optimization tractable. Simulations of the proposed dynamical system on both synthetic and natural image data sets demonstrate that the model is capable of probabilistically correct inference, enabling learning of the dictionary as well as parameters of the prior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.