Tadej Justin scite author profile

Original scientific paperNowadays Human Computer Interaction (HCI) can also be achieved with voice user interfaces (VUIs). To enable devices to communicate with humans by speech in the user's own language, low-cost language portability is often discussed and analysed. One of the most time-consuming parts for the language-adaptation process of VUIcapable applications is the target-language speech-data acquisition. Such data is further used in the development of VUIs subsystems, especially of speech-recognition and speech-production systems. The tempting idea to bypass a long-term process of data acquisition is considering the design and development of an automatic algorithms, which can extract the similar target-language acoustic from different language speech databases. This paper focus on the cross-lingual phoneme mapping between an under-resourced and a well-resourced language. It proposes a novel automatic phoneme-mapping technique that is adopted from the speaker-verification field. Such a phoneme mapping is further used in the development of the HMM-based speech-synthesis system for the under-resourced language. The synthesised utterances are evaluated with a subjective evaluation and compared by the expert knowledge cross-language method against to the baseline speech synthesis based just from the under-resourced data. The results reveals, that combining data from well-resourced and under-resourced language with the use of the proposed phoneme-mapping technique, can improve the quality of under-resourced language speech synthesis.Key words: Voice user interfaces, Human language technologies, HMM-based speech synthesis, Cross-language synthesis, Under-resourced languages, UBM-MAP-GMM phoneme mapping Primjena automatskog medujezičnog akustičnog modeliranja na HMM sintezu govora za oskudne jezične baze. U današnje vrijeme interakcijačovjeka i računala (HCI) može se ostvariti i putem govornih sučelja (VUIs). Da bi se omogućila komunikacija uredaja i korisnika putem govora na vlastitom korisnikovom jeziku, cesto se raspravlja i analizira o jeftinom rješenju prijevoda govora na različite jezike. Jedan od vremenski najzahtjevnijih dijelova procesa prilagodbe jezika za aplikacije koje podržavaju VUI je prikupljanje govornih podataka za ciljani jezik. Ovakvi podaci dalje se koriste za razvoj VUI podsustava, posebice za prepoznavanje i produkciju govora. Primamljiva ideja za izbjegavanje dugotrajnog postupka prikupljanja podataka jeste razmatranje sinteze i razvoja automatskih algoritama koji su sposobni izvesti slična akustična svojstva za ciljani jezik iz postojećih baza različitih jezika. Ovaj rad fokusiran je na povezivanje medujezičnih fonema izmedu oskudnih i bogatih jezičnih baza. Predložena je nova tehnika automatskog povezivanja fonema, usvojena i prilagodena iz područja govorne autentikacije. Ovakvo povezivanje fonema kasnije se koristi za razvoj sustava za sintezu govora zasnovanom na HMM-u za manje poznate jezike. Načinjene govorne izjave ocijenjene su subjektivnim pristupom kroz usporedbu medujezičnih m...

show abstract

A Bilingual HMM-Based Speech Synthesis System for Closely Related Languages

Justin

Pobar

Ipšić

et al. 2012

View full text Add to dashboard Cite

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

Pobar

Justin

Žibert

et al. 2013

View full text Add to dashboard Cite

Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

Justin

Dobrišek

2014

View full text Add to dashboard Cite

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Justin

Žibert

Mihelic

2015

View full text Add to dashboard Cite

Neprekinjena dostava v vzporednem kritičnem poslovnem okolju

Justin¹,

d.o.o²,

Ciglar³

et al. 2018

View full text Add to dashboard Cite

show abstract

Razvoj zbirke slovenskega čustvenega govora iz radijskih iger – EmoLUKS

Justin

Žibert

2015

SLO2.0

View full text Add to dashboard Cite

V prispevku predstavljamo graditev slovenske zbirke čustvenega govora za umetno tvorjenje govora in hkrati raziščemo tudi možnosti njene uporabe pri razpoznavanju čustvenega stanja govorca. V prispevku se osredotočamo na opis razvite metodologije za označevanje paralingvistične informacije v govoru na primeru označevanja čustvenih stanj v slovenskih radijskih igrah. Zbirka vsebuje govorne zvočne signale sedemnajstih radijskih iger. Trenutno označeno gradivo obsega čustven govor enega govorca in ene govorke. Čustvene oznake posnetkov smo pridobili s pomočjo dvostopenjskega označevanja s petimi prostovoljnimi označevalci, ki so označili posnetke v dveh časovno ločenih intervalih. Način označevanja omogoča medsebojno primerjavo oznak označevalcev. S pomočjo označenega gradiva v obeh iteracijah poročamo o konsistentnosti označevalcev in ujemanju njihovih mnenj. Na podlagi večinskega mnenja pridobljenih čustvenih oznak vsakemu posnetku pripišemo tisto čustveno oznako, ki je bila med označevalci največkrat izbrana, in tako označene posnetke združimo v zbirko čustvenega govora EmoLUKS, ki jo kvantitativno in kvalitativno ovrednotimo z uporabo uveljavljenega samodejnega sistema za razpoznavanje čustvenih stanj govorca. Konsistentnost oznak ovrednotimo z dvorazrednim in sedemrazrednim od govorca odvisnim razvrščevalnikom čustvenih stanj. Uspešni rezultati razpoznavanja dodatno potrjujejo, da podatkovna zbirka kljub svoji zahtevnosti vsebuje jasno izražena čustvena stanja govorca.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tadej Justin

Speaker de-identification using diphone recognition and speech synthesis

Towards automatic cross-lingual acoustic modelling applied to HMM-based speech synthesis for under-resourced languages

A Bilingual HMM-Based Speech Synthesis System for Closely Related Languages

A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis

Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

Development and Evaluation of the Emotional Slovenian Speech Database - EmoLUKS

Neprekinjena dostava v vzporednem kritičnem poslovnem okolju

Razvoj zbirke slovenskega čustvenega govora iz radijskih iger – EmoLUKS

Contact Info

Product

Resources

About