All you need is a good init

Mishkin, Dmytro; Matas, Jiřı́

doi:10.48550/arxiv.1511.06422

Cited by 99 publications

(136 citation statements)

References 5 publications

Supporting

Mentioning

133

Contrasting

Order By: Relevance

“…(b) High training complexity: The transmitter needs to perform several tasks, such as symbol mapping, PS, and pre-distortion jointly, and learning the transmitted waveform involves sequential input data, which significantly increases the NN size with the "one-hot" encoding being applied, therefore increasing the training complexity. (c) Parameter initialization: It is difficult to know which parameter choice leads to good performance prior to training, and random parameters initialization can slow down or even completely stall the convergence process [48].…”

Section: A Autoencoder Designmentioning

confidence: 99%

Model-Based End-to-End Learning for WDM Systems With Transceiver Hardware Impairments

Song¹,

Häger²,

Schröder³

et al. 2021

Preprint

View full text Add to dashboard Cite

We propose an autoencoder (AE)-based transceiver for a wavelength division multiplexing (WDM) system impaired by hardware imperfections. We design our AE following the architecture of conventional communication systems. This enables to initialize the AE-based transceiver to have similar performance to its conventional counterpart prior to training and improves the training convergence rate. We first train the AE in a singlechannel system, and show that it achieves performance improvements by putting energy outside the desired bandwidth, and therefore cannot be used for a WDM system. We then train the AE in a WDM setup. Simulation results show that the proposed AE significantly outperforms the conventional approach. More specifically, it increases the spectral efficiency of the considered system by reducing the guard band by 37% and 50% for a rootraised-cosine filter-based matched filter with 10% and 1% roll-off, respectively. An ablation study indicates that the performance gain can be ascribed to the optimization of the symbol mapper, the pulse-shaping filter, and the symbol demapper. Finally, we use reinforcement learning to learn the pulse-shaping filter under the assumption that the channel model is unknown. Simulation results show that the reinforcement-learning-based algorithm achieves similar performance to the standard supervised endto-end learning approach assuming perfect channel knowledge.

show abstract

Section: A Autoencoder Designmentioning

confidence: 99%

Model-Based End-to-End Learning for WDM Systems With Transceiver Hardware Impairments

Song¹,

Häger²,

Schröder³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Performing hyperparameter optimization is computationally expensive so we rely on empirical tests to guide the settings. We use an architecture of 8 hidden layers with 64 nodes each, applying the Gaussian Error Linear Unit (GELU) [34] activation and LSUV weight initialization [35]. Using the Adam [36] optimizer, we minimize either the mean squared error (MSE) loss…”

Section: Multilayer Perceptron (Mlp)mentioning

confidence: 99%

Comparing Machine Learning and Interpolation Methods for Loop-Level Calculations

Chahrour,

Wells

2021

Preprint

View full text Add to dashboard Cite

The need to approximate functions is ubiquitous in science, either due to empirical constraints or high computational cost of accessing the function. In high-energy physics, the precise computation of the scattering cross-section of a process requires the evaluation of computationally intensive integrals. A wide variety of methods in machine learning have been used to tackle this problem, but often the motivation of using one method over another is lacking. Comparing these methods is typically highly dependent on the problem at hand, so we specify to the case where we can evaluate the function a large number of times, after which quick and accurate evaluation can take place. We consider four interpolation and three machine learning techniques and compare their performance on three toy functions, the four-point scalar Passarino-Veltman D 0 function, and the two-loop self-energy master integral M . We find that in low dimensions (d = 3), traditional interpolation techniques like the Radial Basis Function perform very well, but in higher dimensions (d = 5, 6, 9) we find that multi-layer perceptrons (a.k.a neural networks) do not suffer as much from the curse of dimensionality and provide the fastest and most accurate predictions.

show abstract

“…Their design is specific to convolutions with certain non-linearities. Mishkin and Matas [12] and Krähenbühl et al [9] have devised alternative inits for CNNs which initialize layer-by-layer such that that the variance of the activation affect each layer remains constant, e.g. close to one.…”

Section: Related Workmentioning

confidence: 99%

Variance-Aware Weight Initialization for Point Convolutional Neural Networks

Hermosilla¹,

Schelling²,

Ritschel³

et al. 2021

Preprint

View full text Add to dashboard Cite

Appropriate weight initialization has been of key importance to successfully train neural networks. Recently, batch normalization has diminished the role of weight initialization by simply normalizing each layer based on batch statistics. Unfortunately, batch normalization has several drawbacks when applied to small batch sizes, as they are required to cope with memory limitations when learning on point clouds. While well-founded weight initialization strategies can render batch normalization unnecessary and thus avoid these drawbacks, no such approaches have been proposed for point convolutional networks. To fill this gap, we propose a framework to unify the multitude of continuous convolutions. This enables our main contribution, variance-aware weight initialization. We show that this initialization can avoid batch normalization while achieving similar and, in some cases, better performance.

show abstract

All you need is a good init

Cited by 99 publications

References 5 publications

Model-Based End-to-End Learning for WDM Systems With Transceiver Hardware Impairments

Model-Based End-to-End Learning for WDM Systems With Transceiver Hardware Impairments

Comparing Machine Learning and Interpolation Methods for Loop-Level Calculations

Variance-Aware Weight Initialization for Point Convolutional Neural Networks

Contact Info

Product

Resources

About