SPCp1-01: Voice Activity Detection for VoIP-An Information Theoretic Approach

Prasad, R. Venkatesha; Muralishankar, R.; Vijay, Sandip; Shankar, Hari; Pawełczak, Przemysław; Niemegeers, Ignas

doi:10.1109/glocom.2006.603

Cited by 13 publications

(13 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [4], authors used the entropy measure to distinguish between speech and silence as a robust extension to the 3GPP standard. Nevertheless, the system assumes close-talking microphones and during tests ignores the effect of reverberation.…”

Section: Introductionmentioning

confidence: 99%

Using information theory to detect voice activity

Talantzis

Constantinides

2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Voice Activity Detection systems attempt to discriminate between voice and other ambient sounds. Most systems use a single microphone approach and rely on training prior to employment. The performance of these systems relies heavily on reverberation and noise levels. In this paper we present an unsupervised Voice Activity Detection system that uses pairs of microphones to discern between a coherent acoustic source and spatially diffuse noise of low coherence. Measurement of coherency is performed using an information theoretic metric that integrates means to filter out more effectively the effect of reverberation and noise. Using extensive experiments, the performance of the system is investigated. Based on the conditions imposed by the experimental environments it is shown that the proposed system remains more robust than its counterparts in all cases.

show abstract

Section: Introductionmentioning

confidence: 99%

Using information theory to detect voice activity

Talantzis

Constantinides

2009

2009 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…Higher playout buffer size offers increased tolerance towards jitter but increases mouth to ear delay. One simple way to reduce the delay at the playout buffer is to detect the talk spurts [2] and transmit only those segments. This scheme, while reducing the bandwidth, avoids building up of playout buffer.…”

Section: Introductionmentioning

confidence: 99%

A Holistic Study of VoIP Session Quality - The Knobs that Control

Prasad

Vijay

Shankar³

et al. 2008

2008 5th IEEE Consumer Communications and Networking Conference

Self Cite

View full text Add to dashboard Cite

VoIP packets, when transported over the Internet, experience loss and variable delay. The effect of the network not only depends on the background flows but also on the parameters of VoIP packets itself, such as VoIP packet size and the packet generation intervals. While higher sized packets experience more losses, they experience less delay jitter and handling them is thus easy at the playout buffer. To investigate the effect of various network conditions on VoIP session holistically, we present a complete end to end study considering various states of the underlying network. We present as a case study of G.711 coded packets generated at 20 and 40 ms intervals for comparison. While packets carrying 20 ms data are better when the network is loaded, 40 ms packetization is favored when the network is not saturated. This affects the jitter and loss thus affecting the quality. We explain this trade-off using Mean Opinion Scores.

show abstract

“…Krätzer, Dittmann, and Vogel [14] argued that the inactive voice of a speech was not suitable for a being used as a cover object for steganography owing to an obvious distortion of the original speech. By contrast, Huang et al [15] suggested an algorithm for embedding information in some parameters of the speech frame encoded by ITU G.723.1 codec, without leading to distinction between inactive voices and active voices. These are computationally complex and require training and building a model.…”

Section: Related Workmentioning

confidence: 99%

“…Having a packet size equivalent to 10 ms allows the VoIP system to start playing the audio at the receiver's end after 30-40 ms from the time the queue start building up. If the frame duration were 50 ms, an initial delay would be of 150-200 ms, which is unsuitable since, maximum round trip delay within 400 ms [15] for a good quality speech. Therefore, the frame duration must be chosen properly.…”

Section: 1choice Of Frame Durationmentioning

confidence: 99%

Steganography in Audio Files by Entropy using FEC as ReedSolomon of VOIP Streams

S¹,

Vinayagam²

2012

IJCA

View full text Add to dashboard Cite

In this paper introduce a novel technique to identify the voice (active frames) and silent regions (inactive frames) of a speech stream very much suitable for VoIP calls. Thus here the proposed a better voice activity detection based on the entropy algorithm. High-capacity steganography algorithm for embedding data in the inactive frames .Then inactive frames are encoded by G.723.1 source codec, which is used extensively in Voice over Internet Protocol (VoIP).As the data embedding capacity is very high on inactive frames of the audio signals than in the active frames. Entropy based Voice Activity Detection algorithms for VoIP applications can save bandwidth by filtering the frames that do not contain speech .On evaluating the proposed approach with the existing methods, our approach yield a better saving in bandwidth, yet maintaining high capacity of data embedding. yet maintaining good quality of the speech streams and then finally using forward error correcting code as Reed-Solomon codes. It can be used as encoder and decoder. By using Reed-Solomon code, data losses occur in the transmission can be detected and recovered by adding extra information (redundancy) to the original data.

show abstract

SPCp1-01: Voice Activity Detection for VoIP-An Information Theoretic Approach

Cited by 13 publications

References 11 publications

Using information theory to detect voice activity

Using information theory to detect voice activity

A Holistic Study of VoIP Session Quality - The Knobs that Control

Steganography in Audio Files by Entropy using FEC as ReedSolomon of VOIP Streams

Contact Info

Product

Resources

About