Direct Behavior Specification via Constrained Reinforcement Learning

Roy, Julien; Girgis, Roger; Romoff, Joshua; Bacon, Pierre-Luc; Pal, Christopher

doi:10.48550/arxiv.2112.12228

Cited by 1 publication

(1 citation statement)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…25, where λ is a Lagrange multiplier. In our experiments, we fix λ to be a constant scalar, although adaptive approaches such as (Roy et al, 2021) could be explored.…”

Section: Regularization With Lagrangian Penaltiesmentioning

confidence: 99%

Quantized Neural Networks for Low-Precision Accumulation with Guaranteed Overflow Avoidance

Colbert¹,

Pappalardo²,

Petri-Koenig³

2023

Preprint

View full text Add to dashboard Cite

Quantizing the weights and activations of neural networks significantly reduces their inference costs, often in exchange for minor reductions in model accuracy. This is in large part due to compute and memory cost savings in operations like convolutions and matrix multiplications, whose resulting products are typically accumulated into high-precision registers, referred to as accumulators. While many researchers and practitioners have taken to leveraging low-precision representations for the weights and activations of a model, few have focused attention on reducing the size of accumulators. Part of the issue is that accumulating into low-precision registers introduces a high risk of numerical overflow which, due to wraparound arithmetic, can significantly degrade model accuracy. In this work, we introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference. We leverage weight normalization as a means of constraining parameters during training using accumulator bit width bounds that we derive. We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline. We then show that this reduction translates to increased design efficiency for custom FPGA-based accelerators. Finally, we show that our algorithm not only constrains weights to fit into an accumulator of user-defined bit width, but also increases the sparsity and compressibility of the resulting weights. Across all of our benchmark models trained with 8-bit weights and activations, we observe that constraining the hidden layers of quantized neural networks to fit into 16-bit accumulators yields an average 98.2% sparsity with an estimated compression rate of 46.5x all while maintaining 99.2% of the floating-point performance.

show abstract

“…25, where λ is a Lagrange multiplier. In our experiments, we fix λ to be a constant scalar, although adaptive approaches such as (Roy et al, 2021) could be explored.…”

Section: Regularization With Lagrangian Penaltiesmentioning

confidence: 99%