Measuring the performance of isolated spoken Malay speech recognition using Multi-layer Neural Networks

Seman, Noraini; Bakar, Zainab Abu; Bakar, Nordin Abu

doi:10.1109/cssr.2010.5773762

Cited by 5 publications

(10 citation statements)

References 26 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors report an overall recognition accuracy of 85% with their own database. In [8] [50]. The speech recognition models proposed in [48] for Arabic digits and words use two variants of neural network-multi-layer perceptron and Long Short-Term Memory(LSTM).…”

Section: Literature Reviewmentioning

confidence: 99%

“…From the above discussion, it is observed that researchers are using ANN and its variants in recent times also to design ASR systems in some major languages [6], [48], [49], [50] due to their attractive characteristics as discussed in Section 1. It is to be also noted that ANN is being popularly used for digit and isolated word recognition in under-resourced languages [4], [5], [7], [8], [10]. These ANN based ASR systems are reported to deliver good recognition rates.…”

Section: Literature Reviewmentioning

confidence: 99%

“…ii) Feature Extraction computes features from each voiced part in the pre-processed signal. Some popular features used in ASR systems are Linear Predictive Coding (LPC) coefficients [5], [6], Mel Frequency Cepstral Coefficients (MFCC) [4], [6], [7], [8], [10], short-time energy [6], i-vector [11], etc. The mel-frequency scale in MFCC coefficients being proportional to the logarithm of the linear frequency below 1 kHz, it closely reflects the human perception; and hence, MFCC features are mostly used in ASR systems.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Speech Recognition of Isolated Words using a New Speech Database in Sylheti

Chakraborty*¹,

Saikia²

2019

IJRTE

View full text Add to dashboard Cite

With the advancements in the field of artificial intelligence, speech recognition based applications are becoming more and more popular in the recent years. Researchers working in many areas including linguistics, engineering, psychology, etc. have been trying to address various aspects relating to speech recognition in different natural languages around the globe. Although many interactive speech applications in "well-resourced" major languages are being developed, uses of these applications are still limited due to language barrier. Hence, researchers have also been concentrating to design speech recognition system in various under-resourced languages. Sylheti is one of such under-resourced languages primarily spoken in the Sylhet division of Bangladesh and also spoken in the southern part of Assam, India. This paper has two contributions: i) it presents a new speech database of isolated words for the Sylheti language, and ii) it presents speech recognition systems for the Sylheti language to recognize isolated Sylheti words by applying two variants of neural network classifiers. The performances of these recognition systems are evaluated with the proposed database and the observations are presented.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Speech Recognition of Isolated Words using a New Speech Database in Sylheti

Chakraborty*¹,

Saikia²

2019

IJRTE

View full text Add to dashboard Cite

show abstract

“…For classification purpose, each speech segment is assumed to correspond to a class which is labeled as 1 for FP and 2 for ELO. The number of input neuron will be calculated through the experiments by multiplying the cepstral order with the total frames as in (12) and (13) (12) Input Neuron No=CepstralOrder * TotalFrameNumber (13) The number of hidden neurons is determined by trial and error guided by Geometric Pyramid Rule (GPR) as in (14). The number of hidden neurons cannot be too many, otherwise, it cannot obtain good convergence rate [20].…”

Section: B Multilayer Perceptron (Mlp) Neural Networkmentioning

confidence: 99%

“…It is a type of speech feature involving coefficients that represent audio which are derived from a type of cepstral representation of the audio clip [17]. MFCC has been successfully used in recent speech processing related work such as in non-speech detection of dysarthric speech [18], isolated spoken speech recognition [13] and continuous speech recognition [19]. A Discrete Fourier Transform (DFT) is performed on each of the windowed speech waveform with 512 DFT.…”

Section: A Mel Frequency Cepstral Coefficientsmentioning

confidence: 99%

Impact of acoustical voice activity detection on spontaneous filled pause classification

Hamzah

Jamil

Seman

et al. 2014

2014 IEEE Conference on Open Systems (ICOS)

Self Cite

View full text Add to dashboard Cite

Filled pause detection is imperative for spontaneous speech recognition as it may degrade speech recognition rate. However, filled pause is commonly confused with elongation as they shared the same acoustical properties. Few attempts of classifying filled pause and elongation employed Hidden Markov model. Our proposed method of utilizing Neural Network as a classifier achieved 96% precision rate. We also proved that voice activity detection (VAD) affects the performance of speech recognition. Three acoustical-based VAD are compared and the best precision rate is achieved by incorporating volume and firstorder difference features. Experiments are conducted using Malay language spontaneous speeches of Malaysia Parliamentary Debate sessions.

show abstract