Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language

Neuberger, Tilda; Gyarmathy, Dorottya; Gráczi, Tekla Etelka; Horváth, Viktória; Gósy, Mária; Beke, András

doi:10.1007/978-3-319-10816-2_51

Cited by 24 publications

(13 citation statements)

References 14 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…10 conversations and 10 narratives were selected for the study from a Hungarian Database called BEA (Neuberger et al, 2014). Three speakers participated in each conversation; the interviewer (Int) and one speaker (henceforth: the second speaker S2) were constant.…”

Section: Methodsmentioning

confidence: 99%

Pausing strategies with regard to speech style

Gyarmathy¹,

Horváth²

2019

Proceedings of DiSS 2019

Self Cite

View full text Add to dashboard Cite

Speech is occasionally interrupted by silent and filled pauses of various length. Pauses have many different functions in spontaneous speech (e.g. breathing, marking syntactic boundaries as well as speech planning difficulties, time for self-repair). The aim of the study was the analysis of the interrela¬tion between the temporal pattern and the syntactical position of silent pauses (SP) on one hand. On the other hand, filled pauses (FP) were also analyzed according to their phonetic realization, as well as the combination of SPs and FPs. The effect of speech style on pausing strategies was also analyzed. A narrative recording and a conversational recording from 10 speakers (ages between 20 and 35 years, 5 male, 5 female) were selected from Hungarian Spontaneous Speech Database for the study. The material was manually annotated, silent pauses were categorized, then the duration of pauses were extracted. Results showed that the position of silent and filled pauses affects their duration. The speech style did not influenced the frequency of pauses. However, silent and filled pauses were longer in narratives than in conversations. Results suggest that pausing strategies are similar in general; however, the timing patterns of pauses may depend on various factors, e.g. speech style.

show abstract

Section: Methodsmentioning

confidence: 99%

Pausing strategies with regard to speech style

Gyarmathy¹,

Horváth²

2019

Proceedings of DiSS 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…For English, we used the TEDLium dataset (Rousseau et al, 2012); we made use of the utterances of 100 speakers (approximately 15 hours of recordings). For Hungarian, we chose the BEA Database (Neuberger et al, 2014); we trained our DNNs on the speech of 116 subjects (44 hours of recordings overall). We made sure that the annotation suited our needs for both corpora, i.e.…”

Section: Asr Parametersmentioning

confidence: 99%

“…For this, we collected the spontaneous speech of English-speaking and Hungarian-speaking MCI patients and healthy controls. Then we trained two ASR models for the automatic speech analysis step: for English, we used a subset of the TEDLium corpus (Rousseau et al, 2012), while for Hungarian we used a subset of the BEA Hungarian Spoken Language Database (Neuberger et al, 2014). We carried out classification experiments to determine the indicativeness of the different attributes.…”

Section: Introductionmentioning

confidence: 99%

Cross-lingual detection of mild cognitive impairment based on temporal parameters of spontaneous speech

Gosztolya

Balogh

Imre

et al. 2021

Computer Speech & Language

View full text Add to dashboard Cite

Mild Cognitive Impairment (MCI) is a heterogeneous clinical syndrome, often considered as the prodromal stage of dementia. It is characterized by the subtle deterioration of cognitive functions, including memory, executive functions and language. Mainly due to the tenuous nature of these impairments, a high percentage of MCI cases remain undetected. There is evidence that language changes in MCI are present even before the manifestation of other distinctive cognitive symptoms, which offers a chance for early recognition. A cheap non-invasive way of early screening could be the use of automatic speech analysis. Earlier, our research team developed a set of speech temporal parameters, and demonstrated its applicability for MCI detection. For the automatic extraction of these attributes, a Hungarianlanguage ASR system was employed to match the native language of the MCI and healthy control (HC) subjects. In practical applications, however, it would be convenient to use exactly the same tool, regardless of the language spoken by the subjects. In this study we show that our temporal parameter set, consisting of articulation rate, speech tempo and various other attributes describing the hesitation of the subject, can indeed be reliably extracted regardless of the language of the ASR system used. For this purpose, we performed experiments both on English-speaking and on Hungarian-speaking MCI patients and healthy control subjects, using English and Hungarian ASR systems in both cases. Our experimental results indicate that the language on which the ASR system was trained only slightly affects the MCI classification performance, because we got quite similar scores (67-92%) as we did in the monolingual cases (67-92% as well). As our last investigation, we compared the proposed attribute values for the same utterances, utilizing both the English and the Hungarian ASR models. We found that the articulation rate and speech tempo values calculated based on the two ASR models were highly correlated, and so were the attributes corresponding to silent pauses; however, noticeable differences were found regarding the filled pauses (still, these attributes remained indicative for both languages). Our further analysis revealed that this is probably due to a difference regarding the annotation of the English and the Hungarian ASR training utterances.

show abstract

“…The x-vectors scores are given in accord with the corpus used to train the DNN they were extracted with. .256 .301 SWBD + SRE (pre-trained model, [12]) .300 .355 utterances) of the BEA Corpus, which contains Hungarian spontaneous speech (for more details, see [19]). This corpus has a relevant size (in comparison with the SLEEP Corpus), which is convenient when training DNNs.…”

Section: Dnn Training Datamentioning

confidence: 99%

Deep Neural Network Embeddings for the Estimation of the Degree of Sleepiness

Egas-López

Gosztolya

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Estimating the degree of sleepiness from the human speech is an emerging research problem with straightforward applications. In this study, we employ the x-vector approach, currently the state-of-the-art in speaker recognition, as a neural network feature extractor to detect the level of sleepiness of a speaker. Besides using different corpora for fitting the xvector DNN, we also experiment with adding noise and reverberation to the training samples. According to our experimental results for the publicly available Dusseldorf Sleepy Language Corpus, utilizing x-vector embeddings as features for Support Vector Regression consistently leads to competitive performance scores in sleepiness detection. In particular, we present the highest Spearman's correlation coefficient on the public corpus that was achieved by a single method.

show abstract

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language

Cited by 24 publications

References 14 publications

Pausing strategies with regard to speech style

Pausing strategies with regard to speech style

Cross-lingual detection of mild cognitive impairment based on temporal parameters of spontaneous speech

Deep Neural Network Embeddings for the Estimation of the Degree of Sleepiness

Contact Info

Product

Resources

About