2020
DOI: 10.1016/j.specom.2020.03.008
|View full text |Cite
|
Sign up to set email alerts
|

Automatic speaker profiling from short duration speech data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 25 publications
(27 citation statements)
references
References 26 publications
2
14
0
Order By: Relevance
“…Note, however, that the lengths of utterances in that dataset are much higher then of those in the TIMIT dataset and the authors report much worse results on shorter test segments. These results are also on-par with those recently published in [ 24 ] (5.6 and 5.2 MAE female/male) without using any hand-engineered features and relying solely on the low-level signal representation.…”
Section: Resultssupporting
confidence: 86%
See 2 more Smart Citations
“…Note, however, that the lengths of utterances in that dataset are much higher then of those in the TIMIT dataset and the authors report much worse results on shorter test segments. These results are also on-par with those recently published in [ 24 ] (5.6 and 5.2 MAE female/male) without using any hand-engineered features and relying solely on the low-level signal representation.…”
Section: Resultssupporting
confidence: 86%
“…On top of that, the gender classification accuracy is competitive with the results achieved by the d-vector system, shown in Table 10 . This results are also the best in terms of MAE out of all proposed solution and better then the current state-of-the-art results shown in [ 24 ] by 0.31 and 0.08 MAE for female and male speakers, respectively.…”
Section: Resultssupporting
confidence: 57%
See 1 more Smart Citation
“…Speaker attribute estimation: In speech fields, various methods that estimate speaker attributes such as gender, age, and height have been studied [19][20][21][22][23][24]. In the last decade, fully neural network based methods have been examined to precisely capture input speech contexts [21][22][23][24]. In fact, multiple-speaker attributes are often jointly estimated via multi-task learning [22,24].…”
Section: Related Workmentioning
confidence: 99%
“…Vogel and Morgan documented that the length of obtained speech data impacted the measurement accuracy of bio-acoustic features [26]. Although several efforts have been made to explore the accuracy of short-duration speech samples for detecting a disease or estimating a physical parameter [27]- [30], only a few studies have explored the impact of voice sample length on speech characteristics [31]- [33]. Scherer et al have shown that, in sustained vowel tasks, the stability of perturbation measurements, jitter and shimmer, is affected by the task duration.…”
Section: Introductionmentioning
confidence: 99%