Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation

Masumura, Ryo; Asami, Taichi; Oba, Tadamichi; Masataki, Hirokazu; Sakauchi, Sumitaka; Ito, Akinori

doi:10.1587/transinf.2016slp0013

Cited by 5 publications

(5 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Results obtained with LSTM-RNN-LMs are shown in lines (2) and 3, and those obtained with RA-LSTM-RNN-LMs are shown in lines (4) and (5). They show that (3) outperformed (1) and (2), and (5) outperformed (3) and (4). This indicates that neural LMs can be complemented with the n-gram LM.…”

Section: Resultsmentioning

confidence: 84%

“…A lot of studies have been reported for improving language modeling in single speaker tasks. For a while, smoothed n-gram LMs were employed in ASR because they yield powerful performance in spite of simple modeling [1][2][3][4]. In recent studies, neural LMs that capture words by converting continuous representations have attracted a lot of attention [5,6].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder

et al. 2018

Self Cite

View full text Add to dashboard Cite

We propose role play dialogue-aware language models (RPDA-LMs) that can leverage interactive contexts in role play multiturn dialogues for estimating the generative probability of words. Our motivation is to improve automatic speech recognition (ASR) performance in role play dialogues such as contact center dialogues and service center dialogues. Although long short-term memory recurrent neural network based language models (LSTM-RNN-LMs) can capture long-range contexts within an utterance, they cannot utilize sequential interactive information between speakers in multi-turn dialogues. Our idea is to explicitly leverage speakers' roles of individual utterances, which are often available in role play dialogues, for neural language modeling. The RPDA-LMs are represented as a generative model conditioned by a role sequence of a target role play dialogue. We compose the RPDA-LMs by extending hierarchical recurrent encoder-decoder modeling so as to handle the role information. Our ASR evaluation in a contact center dialogue demonstrates that RPDA-LMs outperform LSTM-RNN-LMs and document-context LMs in terms of perplexity and word error rate. In addition, we verify the effectiveness of explicitly taking interactive contexts into consideration.

show abstract

Section: Resultsmentioning

confidence: 84%

Section: Introductionmentioning

confidence: 99%

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…The deep neural network is likely to be prone to over-fitting when the training dataset is small. Therefore, data expansion technology 24 is adopted to realize the expansion of the dataset for the original defect dataset (Figure 3). According to the characteristics of the image, more than one defect is observed in an image.…”

Section: Datasetmentioning

confidence: 99%

Improved SSD-assisted algorithm for surface defect detection of electromagnetic luminescence

Fan

2021

Proceedings of the Institution of Mechanical Engineers, Part O:

View full text Add to dashboard Cite

Defect detection of electromagnetic luminescence (EL) cells is the core step in the production and preparation of solar cell modules to ensure conversion efficiency and long service life of batteries. However, due to the lack of feature extraction capability for small feature defects, the traditional single shot multibox detector (SSD) algorithm performs not well in EL defect detection with high accuracy. Consequently, an improved SSD algorithm with modification in feature fusion in the framework of deep learning is proposed to improve the recognition rate of EL multi-class defects. A dataset containing images with four different types of defects through rotation, denoising, and binarization is established for the EL. The proposed algorithm can greatly improve the detection accuracy of the small-scale defect with the idea of feature pyramid networks. An experimental study on the detection of the EL defects shows the effectiveness of the proposed algorithm. Moreover, a comparison study shows the proposed method outperforms other traditional detection methods, such as the SIFT, Faster R-CNN, and YOLOv3, in detecting the EL defect.

show abstract

“…In addition, ASR systems consider only short context information when calculating the generative probabilities of words because they often use a traditional n-gram language modeling. In order to mitigate these problems, several techniques have been proposed [1]- [3]. In particular, approaches intended to improve LM structure have been aggressively pursued.…”

Section: Overviewmentioning

confidence: 99%

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Masumura

Asami

Oba

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

This paper demonstrates latent word recurrent neural network language models (LW-RNN-LMs) for enhancing automatic speech recognition (ASR). LW-RNN-LMs are constructed so as to pick up advantages in both recurrent neural network language models (RNN-LMs) and latent word language models (LW-LMs). The RNN-LMs can capture longrange context information and offer strong performance, and the LW-LMs are robust for out-of-domain tasks based on the latent word space modeling. However, the RNN-LMs cannot explicitly capture hidden relationships behind observed words since a concept of a latent variable space is not present. In addition, the LW-LMs cannot take into account long-range relationships between latent words. Our idea is to combine RNN-LM and LW-LM so as to compensate individual disadvantages. The LW-RNN-LMs can support both a latent variable space modeling as well as LW-LMs and a long-range relationship modeling as well as RNN-LMs at the same time. From the viewpoint of RNN-LMs, LW-RNN-LM can be considered as a soft class RNN-LM with a vast latent variable space. In contrast, from the viewpoint of LW-LMs, LW-RNN-LM can be considered as an LW-LM that uses the RNN structure for latent variable modeling instead of an n-gram structure. This paper also details a parameter inference method and two kinds of implementation methods, an n-gram approximation and a Viterbi approximation, for introducing the LW-LM to ASR. Our experiments show effectiveness of LW-RNN-LMs on a perplexity evaluation for the Penn Treebank corpus and an ASR evaluation for Japanese spontaneous speech tasks.

show abstract

Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation

Cited by 5 publications

References 42 publications

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder

Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder

Improved SSD-assisted algorithm for surface defect detection of electromagnetic luminescence

Latent Words Recurrent Neural Network Language Models for Automatic Speech Recognition

Contact Info

Product

Resources

About