Single-speaker and multispeaker recognition results are presented for the voice-stop consonants /b,d,g/ using time-delay neural networks (TDNNs) with a number of enhancements, including a new objective function for training these networks. The new objective function, called the classification figure of merit (CFM), differs markedly from the traditional mean-squared-error (MSE) objective function and the related cross entropy (CE) objective function. Where the MSE and CE objective functions seek to minimize the difference between each output node and its ideal activation, the CFM function seeks to maximize the difference between the output activation of the node representing incorrect classifications. A simple arbitration mechanism is used with all three objective functions to achieve a median 30% reduction in the number of misclassifications when compared to TDNNs trained with the traditional MSE back-propagation objective function alone.
This paper presents a number of proofs that equate the outputs of a Multi-Layer Perceptron (MLP) classifier and the optimal Bayesian discriminant function for asymptotically large sets of statistically independent training samples. Two broad classes of objective functions are shown to yield Bayesian discriminant performance. The first class are "reasonable error measures," which achieve Bayesian discriminant performance by engendering classifier outputs that asymptotically equate to a posteriori probabilities. This class includes the mean-squared error (MSE) objective function as well as a number of information theoretic objective functions. The second class are classification figures of merit (CFM mono), which yield a qualified approximation to Bayesian discriminant performance by engendering classifier outputs that asymptotically identify the maximum a posteriori probability for a given input. Conditions and relationships for Bayesian discriminant functional equivalence are given for both classes of objective functions. Differences between the two classes are then discussed very briefly in the context of how they might affect MLP classifier generalization, given relatively small training sets.
We begin by considering current shortfalls with conventional surveillance systems and discuss the potential advantages of distributed, collaborative surveillance systems. Distributed surveillance systems oaeer the capability t o monitor activity from multiple locations over time thereby increasing the likelihood of obtaining discriminating data necessary for interpretation of the activity. Yet the multiplicity of sensors magniaees the volumes of data that must be processed. We present our vision of a system which generates timely interpretations of activities in the scene automatically through the use of mechanisms for collaboration among sensing systems and eaecient perception methods which complement the sensing paradigm. Then we review our recent eaeorts toward achieving this goal and present initial results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.