Julio Cesar Cavalcanti scite author profile

The purpose of this study was to explore the speaker-discriminatory potential of vowel formant mean frequencies in comparisons of identical twin pairs and non-genetically related speakers. The influences of lexical stress and the vowels’ acoustic distances on the discriminatory patterns of formant frequencies were also assessed. Acoustic extraction and analysis of the first four speech formants F1-F4 were carried out using spontaneous speech materials. The recordings comprise telephone conversations between identical twin pairs while being directly recorded through high-quality microphones. The subjects were 20 male adult speakers of Brazilian Portuguese (BP), aged between 19 and 35. As for comparisons, stressed and unstressed oral vowels of BP were segmented and transcribed manually in the Praat software. F1-F4 formant estimates were automatically extracted from the middle points of each labeled vowel. Formant values were represented in both Hertz and Bark. Comparisons within identical twin pairs using the Bark scale were performed to verify whether the measured differences would be potentially significant when following a psychoacoustic criterion. The results revealed consistent patterns regarding the comparison of low-frequency and high-frequency formants in twin pairs and non-genetically related speakers, with high-frequency formants displaying a greater speaker-discriminatory power compared to low-frequency formants. Among all formants, F4 seemed to display the highest discriminatory potential within identical twin pairs, followed by F3. As for non-genetically related speakers, both F3 and F4 displayed a similar high discriminatory potential. Regarding vowel quality, the central vowel /a/ was found to be the most speaker-discriminatory segment, followed by front vowels. Moreover, stressed vowels displayed a higher inter-speaker discrimination than unstressed vowels in both groups; however, the combination of stressed and unstressed vowels was found even more explanatory in terms of the observed differences. Although identical twins displayed a higher phonetic similarity, they were not found phonetically identical.

show abstract

Atuação da fonoaudiologia em unidade de terapia intensiva de um hospital de doenças infecciosas de Alagoas

Silva

Lira

Cavalcanti

et al. 2016

Rev. CEFAC

View full text Add to dashboard Cite

Purpose: to describe the speech therapy in Intensive Care Unit of a main-hospital. Methods: the sample consisted in all records of the minutes book of the research site in 2014. The period data was collected and tabulated in Excel®, analyzed using statistical methods and the results was presented in graphs and tables.Results: in the sample of 166 patients, 77 has participated the research. 40 (51.9%) through speech therapy and 37 (48.1%) through monitoring. This number of patients assisted by speech therapy service was significant, once the average hospital stay was twenty days by the severity of the main-pathologies. Most patients that had some kind of speech therapy was discharged from the intensive care unit being transferred to other hospital units.

show abstract

Multiparametric Analysis of Speaking Fundamental Frequency in Genetically Related Speakers Using Different Speech Materials: Some Forensic Implications

2024

View full text Add to dashboard Cite

Objective: To assess the speaker-discriminatory potential of a set of fundamental frequency estimates in intraidentical twin pair comparisons and cross-pair comparisons (i.e., among all speakers). Participants: A total of 20 Brazilian Portuguese speakers of the same dialect, namely 10 male identical twin pairs aged between 19 and 35, were recruited. Method: the participants were recorded directly through professional microphones while taking part in a spontaneous dialogue over mobile phones. Acoustic measurements were performed in connected speech samples, and in lengthened vowels, at least 160 ms long produced during spontaneous speech. Results: f 0 baseline, central tendency, and extreme values were found mostly discriminatory in intra-twin pair and cross-pair comparisons. These were also the estimates displaying the largest effect sizes. Overall, only three identical twins were found statistically different regarding their f 0 patterns in connected speech, but not for lengthened vowel-based f 0 metrics. Estimates of f 0 variation and modulation were found the least discriminatory across speakers, which may signal the control of speaking style and dialect on dynamic patterns of f 0. Concerning system performance, the base value of f 0 (f 0 baseline) was found the most reliable metric, displaying the lowest equal error rate (EER). Conclusions: the outcomes suggest that, although identical twins were very closely related regarding their f 0 patterns, some pairs could still be differentiated acoustically, only in connected speech. Such findings reinforce the relevance of analyzing long-term f 0 metrics for speaker comparison purposes, with particular consideration to f 0 baseline. Furthermore, f 0 differences across subjects were suggested as more expressive in connected speech than in lengthened vowels.

show abstract

Multi-parametric analysis of speech timing in inter-talker identical twin pairs and cross-pair comparisons: Some forensic implications

2022

View full text Add to dashboard Cite

The purpose of this study was to assess the speaker-discriminatory potential of a set of speech timing parameters while probing their suitability for forensic speaker comparison applications. The recordings comprised of spontaneous dialogues between twin pairs through mobile phones while being directly recorded with professional headset microphones. Speaker comparisons were performed with twins speakers engaged in a dialogue (i.e., intra-twin pairs) and among all subjects (i.e., cross-twin pairs). The participants were 20 Brazilian Portuguese speakers, ten male identical twin pairs from the same dialectal area. A set of 11 speech timing parameters was extracted and analyzed, including speech rate, articulation rate, syllable duration (V-V unit), vowel duration, and pause duration. Three system performance estimates were considered for assessing the suitability of the parameters for speaker comparison purposes, namely global Cllr, EER, and AUC values. These were interpreted while also taking into consideration the analysis of effect sizes. Overall, speech rate and articulation rate were found the most reliable parameters, displaying the largest effect sizes for the factor “speaker” and the best system performance outcomes, namely lowest Cllr, EER, and highest AUC values. Conversely, smaller effect sizes were found for the other parameters, which is compatible with a lower explanatory potential of the speaker identity on the duration of such units and a possibly higher linguistic control regarding their temporal variation. In addition, there was a tendency for speech timing estimates based on larger temporal intervals to present larger effect sizes and better speaker-discriminatory performance. Finally, identical twin pairs were found remarkably similar in their speech temporal patterns at the macro and micro levels while engaging in a dialogue, resulting in poor system discriminatory performance. Possible underlying factors for such a striking convergence in identical twins’ speech timing patterns are presented and discussed.

show abstract

Microphone and Audio Compression Effects on Acoustic Voice Analysis: A Pilot Study

et al. 2023

View full text Add to dashboard Cite

On the speaker discriminatory power asymmetry regarding acoustic-phonetic parameters and the impact of speaking style

2023

View full text Add to dashboard Cite

This study aimed to assess what we refer to as the speaker discriminatory power asymmetry and its forensic implications in comparisons performed in different speaking styles: spontaneous dialogues vs. interviews. We also addressed the impact of data sampling on the speaker's discriminatory performance concerning different acoustic-phonetic estimates. The participants were 20 male speakers, Brazilian Portuguese speakers from the same dialectal area. The speech material consisted of spontaneous telephone conversations between familiar individuals, and interviews conducted between each individual participant and the researcher. Nine acoustic-phonetic parameters were chosen for the comparisons, spanning from temporal and melodic to spectral acoustic-phonetic estimates. Ultimately, an analysis based on the combination of different parameters was also conducted. Two speaker discriminatory metrics were examined: Cost Log-likelihood-ratio (Cllr) and Equal Error Rate (EER) values. A general speaker discriminatory trend was suggested when assessing the parameters individually. Parameters pertaining to the temporal acoustic-phonetic class depicted the weakest performance in terms of speaker contrasting power as evidenced by the relatively higher Cllr and EER values. Moreover, from the set of acoustic parameters assessed, spectral parameters, mainly high formant frequencies, i.e., F3 and F4, were the best performing in terms of speaker discrimination, depicting the lowest EER and Cllr scores. The results appear to suggest a speaker discriminatory power asymmetry concerning parameters from different acoustic-phonetic classes, in which temporal parameters tended to present a lower discriminatory power. The speaking style mismatch also seemed to considerably impact the speaker comparison task, by undermining the overall discriminatory performance. A statistical model based on the combination of different acoustic-phonetic estimates was found to perform best in this case. Finally, data sampling has proven to be of crucial relevance for the reliability of discriminatory power assessment.

show abstract

Assessing the speaker discriminatory power asymmetry of different acoustic-phonetic parameters

Cavalcanti¹,

Eriksson²,

Barbosa³

2022

View full text Add to dashboard Cite

This pilot study set out to assess the speaker discriminatory power asymmetry regarding parameters from different phonetic dimensions in spontaneous speech, i.e., spectral, melodic, and temporal. The speech material consisted of spontaneous telephone conversations between siblings. The participants were 20 male subjects, Brazilian Portuguese speakers from the same dialectal area. Six acoustic-phonetic parameters were chosen for the comparison: f0 median, f0 baseline, speech rate, articulation rate, F3, and F4. Overall, acoustic parameters pertaining to the speech tempo category depicted the worse performance in terms of speaker discriminatory power when assessed in isolation. Such a trend was indicated by the relatively higher median and mean Cllr and EER values. Moreover, from the set of parameters assessed, high formant frequencies, i.e., F3 and F4, were the bestperforming estimates in terms of discriminability depicting the lowest EER and Cllr values. The results suggested a speaker discriminatory power asymmetry concerning different acoustic-phonetic parameters, in which speech tempo estimates presented a lower discriminatory power when compared to melodic and spectral parameters. The findings also suggest that data sampling is crucial for the reliability of Cllr and EER calculations.

show abstract

Laryngealization, Gender and Speakers' Distinctiveness in Brazilian Portuguese

Cavalcanti¹,

Lucente²,

Barbosa³

2018

View full text Add to dashboard Cite

This work aims to analyze how the occurrence of laryngealization in Brazilian Portuguese can contribute to a speaker characterization, through the analysis of laryngealization rates and its occurrence in vowel and consonant segments, in order to verify which measures would be more representative of a personal speech style. This work also aims at analyzing the influence of gender on the laryngealization rates. The corpus consists of semispontaneous speech records of 10 speakers, five men and five women, who speak the same dialect, ages ranging from 20 to 26 years old, all of them with a high school degree. These recordings are composed by the retelling of a story titled "Pear Film", a 6-minutes short film. Speech data were segmented and analyzed using the software Praat. Laryngealization was identified by hearing in vocalic and consonant segments by the first author, a speech therapist, and confirmed by waveform and spectrogram inspection. Results show a significant distinction in the occurrence of laryngealization between speakers, which may suggest that laryngealization rates could be relevant for speaker comparison. The results related to gender have revealed higher laryngealization rates for females.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.