Pakala, Akhil scite author profile

Lattice-based cryptography (LBC) exploiting Learning with Errors (LWE) problems is a promising candidate for post-quantum cryptography. Number theoretic transform (NTT) is the latency-and energy-dominant process in the computation of LWE problems. This paper presents a compact and efficient in-MEmory NTT accelerator, named MeNTT, which explores optimized computation in and near a 6T SRAM array. Specifically-designed peripherals enable fast and efficient modular operations. Moreover, a novel mapping strategy reduces the data flow between NTT stages into a unique pattern, which greatly simplifies the routing among processing units (i.e., SRAM column in this work), reducing energy and area overheads. The accelerator achieves significant latency and energy reductions over prior arts.

show abstract

Reduced Memory Viterbi Decoding for Hardware-accelerated Speech Recognition

Raj

Akhil

Chandrachoodan

2022

ACM Trans. Embed. Comput. Syst.

View full text Add to dashboard Cite

Large Vocabulary Continuous Speech Recognition (LVCSR) systems require Viterbi searching through a large state space to find the most probable sequence of phonemes that led to a given sound sample. This needs storing and updating of a large Active State List (ASL) in the on-chip memory (OCM) at regular intervals (called frames), which poses a major performance bottleneck for speech decoding. Most works use hash tables for OCM storage while beam-width pruning to restrict the ASL size. In order to achieve a decent accuracy and performance, a large OCM, numerous acoustic probability computations and DRAM accesses are incurred. We propose to use a binary search tree (BST) for ASL storage and a max heap (MH) data structure to track the worst cost state and efficiently replace it when a better state is found. With this approach, the ASL size can be reduced from over 32K to 512 with minimal impact on recognition accuracy for a 7000-word vocabulary model. This, combined with a caching technique for acoustic scores, reduced the DRAM data accessed by 31 × and the acoustic probability computations by 26 ×. The approach has also been implemented in hardware on a Xilinx Zynq FPGA at 200 MHz using the Vivado SDS compiler. We study the trade-offs between the amount of OCM used, and word error rate (WER) and decoding speed to show the effectiveness of the approach. The resulting implementation is capable of running faster than real-time with 91% lesser block-RAMs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pakala, Akhil

MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator

MeNTT: A Compact and Efficient Processing-in-Memory Number Theoretic Transform (NTT) Accelerator

Reduced Memory Viterbi Decoding for Hardware-accelerated Speech Recognition

Contact Info

Product

Resources

About