We present an integrated system with structural Gaussian mixture models (SGMMs) and a neural network for purposes of achieving both computational efficiency and high accuracy in text-independent speaker verification. A structural background model (SBM) is constructed first by hierarchically clustering all Gaussian mixture components in a universal background model (UBM). In this way the acoustic space is partitioned into multiple regions in different levels of resolution. For each target speaker, a SGMM can be generated through multilevel maximum a posteriori (MAP) adaptation from the SBM. During test, only a small subset of Gaussian mixture components are scored for each feature vector in order to reduce the computational cost significantly. Furthermore, the scores obtained in different layers of the tree-structured models are combined via a neural network for final decision. Different configurations are compared in the experiments conducted on the telephony speech data used in the NIST speaker verification evaluation. The experimental results show that computational reduction by a factor of 17 can be achieved with 5% relative reduction in equal error rate (EER) compared with the baseline. The SGMM-SBM also shows some advantages over the recently proposed hash GMM, including higher speed and better verification performance. EDICS: 1-SPEA Index Terms-Gaussian clustering, neural network, speaker verification, structural Gaussian mixture model. I. INTRODUCTION R ESEARCH on speaker recognition [1], including identification and verification, has been an active area for several decades. The goal is to have a machine automatically identify a particular person or verify a person's claimed identity from his/her voice. As one of the techniques in biometrics, speaker recognition can be used in many access control applications, such as network security, phone transactions, room access, etc. The speakers are divided into two groups, the enrolled target speakers and the nontarget speakers or background speakers. Both identification and verification can be classified into text-independent and text-dependent applications based on whether or Manuscript