Hanna Tseran scite author profile

Hanna Tseran

2Publications

0Citation Statements Received

25Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

Tseran¹,

Montúfar²

2023

Preprint

View full text Add to dashboard Cite

We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. We observe that the distribution of the input-output Jacobian depends on the input, which complicates a stable parameter initialization. Based on the moments of the gradients, we formulate parameter initialization strategies that avoid vanishing and exploding gradients in wide networks. Experiments with deep fully-connected and convolutional networks show that this strategy improves SGD and Adam training of deep maxout networks. In addition, we obtain refined bounds on the expected number of linear regions, results on the expected curve length distortion, and results on the NTK.Maxout networks A rank-K maxout unit, introduced by Goodfellow et al. ( 2013), computes the maximum of K real-valued parametric affine functions. Concretely, a rank-K maxout unit with n inputs implements a function. . , K}, are trainable weights and biases. The K arguments of the maximum are called the pre-activation features of the maxout unit. This may be regarded as a multi-argument generalization of a ReLU, which computes the maximum of a real-valued affine function and zero. Goodfellow et al. (2013) demonstrated that maxout networks could perform better than ReLU networks under similar circumstances. Additionally, maxout networks have been shown to be useful for combating catastrophic forgetting in neural networks (Goodfellow et al., 2015). On the other hand, Castaneda et al. ( 2019) evaluated the performance of maxout networks in a big data setting and observed that increasing the width of ReLU networks is more effective in improving performance than replacing ReLUs with maxout units and that ReLU networks converge faster

show abstract

On the Expected Complexity of Maxout Networks

Tseran¹,

Montúfar²

2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hanna Tseran

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

On the Expected Complexity of Maxout Networks

Contact Info

Product

Resources

About