Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2012
DOI: 10.4218/etrij.11.0111.0344
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision

Abstract: In this paper, we propose a novel approach to statistical model-based voice activity detection (VAD) that incorporates a second-order conditional maximum a posteriori (CMAP) criterion. As a technical improvement for the first-order CMAP criterion in [1], we consider both the current observation and the voice activity decision in the previous two frames to take full consideration of the interframe correlation of voice activity. This is clearly different from the previous approach [1] in that we employ the voice… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2014
2014
2014
2014

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…To establish baseline acoustic accent detection performance, we construct distinct feature sets using pitch, intensity, energy, duration, and voice quality features [15][16][17][18] . For the pitch and intensity feature set, we extract the minimum, maximum, mean, standard deviation, and zscore of the maximum pitch within the syllable from the raw and z-score speaker normalized pitch contours.…”
Section: Accent Detection Using Acoustic Feature Setsmentioning
confidence: 99%
“…To establish baseline acoustic accent detection performance, we construct distinct feature sets using pitch, intensity, energy, duration, and voice quality features [15][16][17][18] . For the pitch and intensity feature set, we extract the minimum, maximum, mean, standard deviation, and zscore of the maximum pitch within the syllable from the raw and z-score speaker normalized pitch contours.…”
Section: Accent Detection Using Acoustic Feature Setsmentioning
confidence: 99%
“…However, the incoming number (that is, caller ID) displayed on the phone screen is not sufficient to detect vishing attacks since vishers can modify the displayed number on the phone by using a technique called "caller ID spoofing"; therefore, the recipient cannot be certain, from the displayed number alone, that the phone call is coming from a trusted sender. Rather than the displayed number, the recipient can use the phone caller's voice characteristics, such as pitch, accent, and pronunciation, to effectively detect vishers [3]. For the time being, the best option is to try to educate users about these attacks and the associated risks -however, many security researchers have warned that the effectiveness of such education is inherently limited [4]- [5].…”
Section: Introductionmentioning
confidence: 99%