2020
DOI: 10.37917/ijeee.16.2.10
|View full text |Cite
|
Sign up to set email alerts
|

A Review on Voice-based Interface for Human-Robot Interaction

Abstract: With the recent developments of technology and the advances in artificial intelligence and machine learning techniques, it has become possible for the robot to understand and respond to voice as part of Human-Robot Interaction (HRI). The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments. In this work, a review of the voice-based interface for HRI systems has been presented. The rev… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 0 publications
0
10
0
Order By: Relevance
“…TTS synthesis can be defined as one of the systems by which normal language text is converted into speech. There are many differences between machine speech production and human, however, the increase in the capability of machine learning paradigms for simulating human speech production mechanisms will result in a more natural and accurate TTS [13], [24]. In this study, the pyttsx3 library [25] has been used for TTS synthesis as a robot's speech response.…”
Section: Speech Response Based On Tts Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…TTS synthesis can be defined as one of the systems by which normal language text is converted into speech. There are many differences between machine speech production and human, however, the increase in the capability of machine learning paradigms for simulating human speech production mechanisms will result in a more natural and accurate TTS [13], [24]. In this study, the pyttsx3 library [25] has been used for TTS synthesis as a robot's speech response.…”
Section: Speech Response Based On Tts Methodsmentioning
confidence: 99%
“…The formant frequencies' values decrease as the vocal tract length increases. Both male and female adults have higher formant frequencies compared to children [5], [13], [14]. Formants were only measured at the glottal pulse to make the measurement easier along with the whole utterance.…”
Section: Features Extraction Based On Formantsmentioning
confidence: 99%
“…Therefore, knowing the pros and cons of each classifier can help in selecting the suitable classifier precisely. Machine learning classification approaches such as SVM, human visual system (HVS), Naïve Bayes (NB), and K-NN represent the most discriminatory and appropriate classifiers' techniques [56]- [58]. Table 8 illustrated the pros and cons of machine learning classifiers that help in detecting drivers' drowsiness.…”
Section: Learning Processmentioning
confidence: 99%
“…Between all types of speech-based feature extraction domains, Cepstral domain features are the most successful ones, where a cepstrum is obtained by taking the inverse Fourier transform of the signal spectrum. MFCC is the most important method to extract speech-based features in this domain [8]. MFCCs greatness stems from the ability to exemplify the spectrum of speech amplitude in a concise form.…”
Section: Mel-frequency Cepstral Coefficients (Mfccs)mentioning
confidence: 99%
“…These steps are shown in Figure 1. At the end of these steps, one energy and 12 cepstral features are obtained [8,10].…”
Section: Mel-frequency Cepstral Coefficients (Mfccs)mentioning
confidence: 99%