Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

Masumura, Ryo; Makishima, Naoki; Ihori, Mana; Takashima, Akihiko; Tanaka, Tomohiro; Orihashi, Shota

doi:10.1109/icassp39728.2021.9414928

Cited by 16 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As the shared task is given with a separate training data set, an effective model has to be created during the training. Therefore, hierarchical transformer based model for large context end to end ASR can be used (Masumura et al, 2021). In the recent era, the environment is changing with smart systems and is identified that there is a need for ASR systems that are capable of handling speech of elderly people spoken in their native languages.…”

Section: Related Workmentioning

confidence: 99%

Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in Tamil

Bharathi¹,

Chakravarthi²,

Cn³

et al. 2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

This paper illustrates the overview of the shared task on automatic speech recognition in the Tamil language. In the shared task, spontaneous Tamil speech data gathered from elderly and transgender people was given for recognition and evaluation. These utterances were collected from people when they communicated in the public locations such as hospitals, markets, vegetable shop, etc. The speech corpus includes utterances of male, female, and transgender and was split into training and testing data. The given task was evaluated using WER (Word Error Rate). The participants used the transformer-based model for automatic speech recognition. Different results using different pre-trained transformer models are discussed in this overview paper.

show abstract

Section: Related Workmentioning

confidence: 99%

Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in Tamil

Bharathi¹,

Chakravarthi²,

Cn³

et al. 2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

show abstract

“…Increase in WER value happens if the quality of recorded speech is low (Iribe et al, 2015). E2E ASR transformer can do encoding and decoding hierarchically by combining the transformers for large context (Masumura et al, 2021). Using the Hybrid based LSTM transformer, the WER is reduced with 25.4% by transfer learning.…”

Section: Related Workmentioning

confidence: 99%

SUH_ASR@LT-EDI-ACL2022: Transformer based Approach for Speech Recognition for Vulnerable Individuals in Tamil

Suhasini¹,

Bharathi²

2022

Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

View full text Add to dashboard Cite

An Automatic Speech Recognition System is developed for addressing the Tamil conversational speech data of the elderly people and transgender. The speech corpus used in this system is collected from the people who adhere their communication in Tamil at some primary places like bank, hospital, vegetable markets. Our ASR system is designed with pre-trained model which is used to recognize the speech data. WER(Word Error Rate) calculation is used to analyse the performance of the ASR system. This evaluation could help to make a comparison of utterances between the elderly people and others. Similarly, the comparison between the transgender and other people is also done. Our proposed ASR system achieves the word error rate as 39.65%.

show abstract

“…Recent model compression works fall under three general classes: Pruning which forces some weights or activations to zero [20-22, 26, 32, 34, 40, 47] combined with "zero-aware" memory encoding. Knowledge distillation distills a larger, "teacher" model into a smaller "student" model [1,17,24,27,29,37,43]. Lastly, quantization where the parameters and/or activations are quantized to shorter bit-widths [6,19,39,50,51,54].…”

Section: Introductionmentioning

confidence: 99%

Mokey

Zadeh

Mahmoud

Abdelhadi

et al. 2022

Proceedings of the 49th Annual International Symposium on Computer Architecture

View full text Add to dashboard Cite

Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy and capability for Natural Language Processing applications. These models demand more computational power, storage, and energy. Mokey reduces the footprint of stateof-the-art 32-bit or 16-bit floating-point transformer models by quantizing all values to 4-bit indexes into dictionaries of representative 16-bit fixed-point centroids. Mokey does not need fine-tuning, an essential feature as often the training resources or datasets are not available to many. Exploiting the range of values that naturally occur in transformer models, Mokey selects centroid values to also fit an exponential curve. This unique feature enables Mokey to replace the bulk of the original multiply-accumulate operations with narrow 3b fixed-point additions resulting in an area-and energy-efficient hardware accelerator design. Over a set of stateof-the-art transformer models, the Mokey accelerator delivers an order of magnitude improvements in energy efficiency over a Tensor Cores-based accelerator while improving performance by at least 4× and as much as 15× depending on the model and on-chip buffering capacity. Optionally, Mokey can be used as memory compression assist for any other accelerator transparently stashing wide floating-point or fixed-point activations or weights into narrow 4-bit indexes. Mokey proves superior to prior state-of-the-art quantization methods for Transformers. CCS CONCEPTS• Computing methodologies → Natural language processing; • Computer systems organization → Neural networks.

show abstract

Hierarchical Transformer-Based Large-Context End-To-End ASR with Large-Context Knowledge Distillation

Cited by 16 publications

References 26 publications

Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in Tamil

Findings of the Shared Task on Speech Recognition for Vulnerable Individuals in Tamil

SUH_ASR@LT-EDI-ACL2022: Transformer based Approach for Speech Recognition for Vulnerable Individuals in Tamil

Mokey

Contact Info

Product

Resources

About