As part of a project into speech recognition io meeting envimn-men% we have collected a corpus of m u l t i z h l meeting recordings. We expccted the identification of speaker activity to be straightfonvard given that the participants had individual microphones, but simplc approaches yielded unacceptably erroneous labelings, mainly due to crosstalk between nearby speakers and wide variations in channel characteristics. Therefore, we have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM) A baseline HMM speech activity detector has been extended to use mixtures of Gaussianr to achieve robustness for different speaken under different conditions. Feature normalition and crosscornlalion processing are used to increase the channel independence and to detect crosstalk. W e use of both energy normalization and crosscorrelation based postprocessing results in a 35% relative reduction of the frame error rate. Speech recognition experiments show that it is beneficial in this multispeaker setting to use the output of the speech activity detector for presegmenting the recognizer input, achieving word mor rates within 10% of those achieved with manual turn labeling. [8] Shr~nberg, E.. Stolckc. A , and Baron. D.. "Observations on avcrlsp' Findings and rmplicationr for n~t o r n i l t i~ processing ofmulti-pany c w w~t i o n " . Proc Ewospeah-2001, Aalburg. 0-7803-7343-X/02/$17.00 Q 2002 IEEE I10