On Expressivity and Trainability of Quadratic Networks

Fan, Fenglei; M, Li; Wang, Fei; Lai, Rongjie; Wang, Ge

doi:10.48550/arxiv.2110.06081

Cited by 2 publications

(5 citation statements)

References 32 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of training epochs is 200. For quadratic networks, we adopt the ReLinear training strategy [9], where the learning rate of quadratic terms is set to 1 × 10 −4 . We adopt Adam as an optimizer.…”

Section: Gaussian Mixture Datamentioning

confidence: 99%

“…We use Adam as an optimizer for training all methods. In particular, for QUNet, we adopt the ReLinear strategy as [9], where the learning rate of quadratic terms is set to 1 × 10 −4 . We also employ the gradient clip norm method with a maximum norm value of 0.01 to constrain the over-growth of weights.…”

Section: Efficiency On the Cell Datasetmentioning

confidence: 99%

“…Fan et al demonstrated that a single quadratic neuron can execute the XOR logic operation, while a conventional neuron cannot [8]. Furthermore, the superior expressivity of quadratic networks was shown by the spline theory [9]-with ReLU as activation functions, a conventional network is a piecewise linear function, while a quadratic one is a piecewise polynomial. According to the spline theory, a piecewise polynomial spline typically has a lower approximation error than that of a piecewise linear one, thereby justifying the superior expressivity of quadratic networks.…”

Section: Introductionmentioning

confidence: 99%

“…According to the spline theory, a piecewise polynomial spline typically has a lower approximation error than that of a piecewise linear one, thereby justifying the superior expressivity of quadratic networks. Concurrently, the ReLinear algorithm was proposed to address the training instability issue of quadratic networks [9]. This algorithm initializes the quadratic network as the conventional network and controls the learning rate of quadratic terms.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

A Review of Artificial Intelligence for Games

Fan¹,

Wu²,

Tian³

2020

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

Inspired by neuronal diversity in the biological neural system, a plethora of studies proposed to design novel types of artificial neurons and introduce neuronal diversity into artificial neural networks. Recently proposed quadratic neuron, which replaces the inner-product operation in conventional neurons with a quadratic one, have achieved great success in many essential tasks. Despite the promising results of quadratic neurons, there is still an unresolved issue: Is the superior performance of quadratic networks simply due to the increased parameters or due to the intrinsic expressive capability? Without clarifying this issue, the performance of quadratic networks is always suspicious. Additionally, resolving this issue is reduced to finding killer applications of quadratic networks. In this paper, with theoretical and empirical studies, we show that quadratic networks enjoy parametric efficiency, thereby confirming that the superior performance of quadratic networks is due to the intrinsic expressive capability. This intrinsic expressive ability comes from that quadratic neurons can easily represent nonlinear interaction, while it is hard for conventional neurons. Theoretically, we derive the approximation efficiency of the quadratic network over conventional ones in terms of real space and manifolds. Moreover, from the perspective of the Barron space, we demonstrate that there exists a functional space whose functions can be approximated by quadratic networks in a dimension-free error, but the approximation error of conventional networks is dependent on dimensions. Empirically, experimental results on synthetic data, classic benchmarks, and real-world applications show that quadratic models broadly enjoy parametric efficiency, and the gain of efficiency depends on the task. We have shared our code in https://github.com/asdvfghg/quadratic efficiency.

show abstract

Section: Gaussian Mixture Datamentioning

confidence: 99%

Section: Efficiency On the Cell Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

A Review of Artificial Intelligence for Games

Fan¹,

Wu²,

Tian³

2020

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

show abstract

“…). Furthermore, to facilitate the training of the quadratic parameters and guarantee the model convergence, Fan et al introduced a Relinear strategy which includes a special initialization on the quadratic terms: w g = 0, b g = 1, w b = 0, and b b = 0, assisted by a shrunk small learning rate to prevent magnitude explosion [31].…”

mentioning

confidence: 99%

SQ-Swin: a Pretrained Siamese Quadratic Swin Transformer for Lettuce Browning Prediction

Wang¹,

Zhang²,

Xu³

et al. 2022

Preprint

View full text Add to dashboard Cite

Enzymatic browning is a major quality defect of packaged "ready-to-eat" fresh-cut lettuce salads. While there have been many research and breeding efforts to counter this problem, progress is hindered by the lack of a technology to identify and quantify browning rapidly, objectively, and reliably. Here, we report a deep learning model for lettuce browning prediction. To the best of our knowledge, it is the first-of-itskind on deep learning for lettuce browning prediction using a pretrained Siamese Quadratic Swin (SQ-Swin) transformer with several highlights. First, our model includes quadratic features in the transformer model which is more powerful to incorporate real world representations than the linear transformer. Second, a multi-scale training strategy is proposed to augment the data and explore more of the inherent self-similarity of the lettuce images. Third, the proposed model uses a siamese architecture which learns the inter-relations among the limited training samples. Fourth, the model is pretrained on the ImageNet and then trained with the reptile meta-learning algorithm to learn higher-order gradients than a regular one. Experiment results on the fresh-cut lettuce datasets show that the proposed SQ-Swin outperforms the traditional methods and other deep learning-based backbones.

show abstract

On Expressivity and Trainability of Quadratic Networks

Cited by 2 publications

References 32 publications

A Review of Artificial Intelligence for Games

A Review of Artificial Intelligence for Games

SQ-Swin: a Pretrained Siamese Quadratic Swin Transformer for Lettuce Browning Prediction

Contact Info

Product

Resources

About