Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

Kim, June-Woo; Yoon, Hyekyung; Jung, Ho‐Young

doi:10.1109/access.2021.3115608

Cited by 7 publications

(7 citation statements)

References 32 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…MT system has been developed to include a voice system for the translated terms (Kim et al, 2021). This significant property assists users to learn words' spelling easily; especially it can be applied in dual mode (i.e.…”

Section: Translation Voice Accompanied With Translated Termsmentioning

confidence: 99%

See 1 more Smart Citation

Problems of Machine Translation Systems in Arabic

Shaikhli

2022

jltr

View full text Add to dashboard Cite

the human need for language translation has been increasing because of knowledge fields’ expansion and open communications across all countries throughout the world. Accordingly, the traditional translation has become insufficient and machine translation is the best alternative. However, despite its astounding development during the past decades, as an inevitable alternative, machine translation still faces many challenges that make it incomparable with human professional translation. This indicates that machine translation in all its types has to be supported by highly-developed tools that can enhance its effectiveness. This study showed the advantages of machine translation, discussed some of its most common challenges, and accordingly introduced some recommendations that should be taken into account to improve its effectiveness regarding Arabic Language.

show abstract

Section: Translation Voice Accompanied With Translated Termsmentioning

confidence: 99%

“…Compared with a manual translation system, MT system needs less time and effort the user to start the translation process (Filmer, 2019;Kim et al, 2021). On the contrary, the manual translation needs the user to prepare a dictionary and search manually and alphabetically to find the target term.…”

Section: Ease Of Access and Little Effort Requiredmentioning

confidence: 99%

Problems of Machine Translation Systems in Arabic

Shaikhli

2022

jltr

View full text Add to dashboard Cite

show abstract

“…The evaluation data highlighted tentative guesses at the age of the presenters and all four lectures received neutral scores for this category although one evaluator was more confident that they could judge the age of two presenters as being between 40-50 years old. Research into the effect of age on the voice tends to have been linked to an older population as described by Kim et al [9]. The authors discuss the concept of a voice conversion framework coupled with linguistic information that may help to reduce issues of bias where the voice files used to generate data sets are mainly from younger adults.…”

Section: Findings and Discussionmentioning

confidence: 99%

Exploring Practical Metrics to Support Automatic Speech Recognition Evaluations

Draffan,

Wald,

Ding

et al. 2023

Assistive Technology: Shaping a Sustainable and Inclusive World

View full text Add to dashboard Cite

Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.

show abstract

“…In Figure 2a, we show an example of a Mel-spectrogram feature. These types of transforms allow us to handle the original waveform by extracting useful features and achieve human-level performance in various speech classification tasks [4], [17]- [19].…”

Section: Speech Classification Systems Transform An Audio Waveformmentioning

confidence: 99%

Exploring Diverse Feature Extractions for Adversarial Audio Detection

et al. 2023

View full text Add to dashboard Cite

Although deep learning models have exhibited excellent performance in various domains, recent studies have discovered that they are highly vulnerable to adversarial attacks. In the audio domain, malicious audio examples generated by adversarial attacks can cause significant performance degradation and system malfunctions, resulting in security and safety concerns. However, compared to recent developments in the audio domain, the properties of the adversarial audio examples and defenses against them still remain largely unexplored. In this study, to provide a deeper understanding of the adversarial robustness in the audio domain, we first investigate traditional and recent feature extractions in terms of adversarial attacks. We show that adversarial audio examples generated from different feature extractions exhibit different noise patterns, and thus can be distinguished by a simple classifier. Based on the observation, we extend existing adversarial detection methods by proposing a new detection method that detects adversarial audio examples using an ensemble of diverse feature extractions. By combining the frequency and selfsupervised feature representations, the proposed method provides a high detection rate against both whitebox and black-box adversarial attacks. Our empirical results demonstrate the effectiveness of the proposed method in speech command classification and speaker recognition.

show abstract

Linguistic-Coupled Age-to-Age Voice Translation to Improve Speech Recognition Performance in Real Environments

Cited by 7 publications

References 32 publications

Problems of Machine Translation Systems in Arabic

Problems of Machine Translation Systems in Arabic

Exploring Practical Metrics to Support Automatic Speech Recognition Evaluations

Exploring Diverse Feature Extractions for Adversarial Audio Detection

Contact Info

Product

Resources

About