End-to-end attention-based large vocabulary speech recognition

Bahdanau, Dzmitry; Chorowski, Jan; Serdyuk, Dmitriy; Brakel, Philémon; Bengio, Yoshua

doi:10.1109/icassp.2016.7472618

Cited by 976 publications

(757 citation statements)

References 23 publications

(39 reference statements)

Supporting

Mentioning

720

Contrasting

Unclassified

Order By: Relevance

“…The word "ROCK" is corrected to "DRAW" after hearing "RATE" and "IN DRAW RATE" to "AND DRAW CROWD" while hearing "PEOPLE". [9] CTC + Trigram (extended) 7.34% Miao et al [9] CTC + Trigram 9.07% Hannun et al [8] CTC + Bigram 14.1% Bahdanau et al [10] Encoder-decoder + Trigram 11.3% Woodland et al [21] GMM-HMM + Trigram 9.46% Miao et al [9] DNN-HMM + Trigram 7.14% is roughly 0.5% to 1% WER. However, there was little difference when the beam width increases from 512 to 2048 in our preliminary experiments.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Character-level incremental speech recognition with recurrent neural networks

Hwang

Sung

2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In real-time speech recognition applications, the latency is an important issue. We have developed a character-level incremental speech recognition (ISR) system that responds quickly even during the speech, where the hypotheses are gradually improved while the speaking proceeds. The algorithm employs a speech-to-character unidirectional recurrent neural network (RNN), which is end-to-end trained with connectionist temporal classification (CTC), and an RNN-based character-level language model (LM). The output values of the CTC-trained RNN are character-level probabilities, which are processed by beam search decoding. The RNN LM augments the decoding by providing long-term dependency information. We propose tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency. This system not only responds quickly on speech but also can dictate out-of-vocabulary (OOV) words according to pronunciation. The proposed model achieves the word error rate (WER) of 8.90% on the Wall Street Journal (WSJ) Nov'92 20K evaluation set when trained on the WSJ SI-284 training set.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Also, a sub-lexical language model is proposed in [5] for detecting previously unseen words. RNN-based character-level end-to-end ASR systems were studied in [6,7,8,9,10]. However, they lack the capability of dictating OOV words since the decoding is performed with word-level LMs.…”

Section: Introductionmentioning

confidence: 99%

Character-level incremental speech recognition with recurrent neural networks

Hwang

Sung

2016

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…In this sense, this task is similar to aspect-based sentiment analysis (Pontiki et al, 2016), where the task is not to classify a text or sentence, but an entity within the text. The notion of focus is similar to attention (Bahdanau et al, 2016;Yin et al, 2016), with the difference that attention is learned during training whereas focus is given as an additional input.…”

Section: Approachmentioning

confidence: 99%

HCS at SemEval-2017 Task 5: Polarity detection in business news using convolutional neural networks

Pivovarova¹,

Escoter²,

Klami

et al. 2017

Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

View full text Add to dashboard Cite

Task 5 of SemEval-2017 involves finegrained sentiment analysis on financial microblogs and news. Our solution for determining the sentiment score extends an earlier convolutional neural network for sentiment analysis in several ways. We explicitly encode a focus on a particular company, we apply a data augmentation scheme, and use a larger data collection to complement the small training data provided by the task organizers. The best results were achieved by training a model on an external dataset and then tuning it using the provided training dataset.

show abstract

“…The N-grams of phrases has dimensions as 1 dimensional, 2 dimensional, 3 dimensional and they called as 'unigrams', 'bigrams', 'trigrams' respectively. The N-Grams are mostly using for the speech pattern recognition [14,15] and identify the particular language. Also text classification requires the N-Grams for effective classification [16].…”

Section: Generate N-gramsmentioning

confidence: 99%

Sentiment analysis of feature ranking methods for classification accuracy

Joseph

Mugauri

Sumathy

2017

IOP Conf. Ser.: Mater. Sci. Eng.

View full text Add to dashboard Cite

Abstract. Text pre-processing and feature selection are important and critical steps in text mining. Text pre-processing of large volumes of datasets is a difficult task as unstructured raw data is converted into structured format. Traditional methods of processing and weighing took much time and were less accurate. To overcome this challenge, feature ranking techniques have been devised. A feature set from text preprocessing is fed as input for feature selection. Feature selection helps improve text classification accuracy. Of the three feature selection categories available, the filter category will be the focus. Five feature ranking methods namely: document frequency, standard deviation information gain, CHI-SQUARE, and weighted-log likelihood -ratio is analyzed.

show abstract

End-to-end attention-based large vocabulary speech recognition

Cited by 976 publications

References 23 publications

Character-level incremental speech recognition with recurrent neural networks

Character-level incremental speech recognition with recurrent neural networks

HCS at SemEval-2017 Task 5: Polarity detection in business news using convolutional neural networks

Sentiment analysis of feature ranking methods for classification accuracy

Contact Info

Product

Resources

About