Recently, DNN model compression based on network architecture design, e.g., SqueezeNet, attracted a lot of attention. Compared to well-known models, these extremely compact networks don't show any accuracy drop on image classification. An emerging question, however, is whether these compression techniques hurt DNN's learning ability other than classifying images on a single dataset. Our preliminary experiment shows that these compression methods could degrade domain adaptation (DA) ability, though the classification performance is preserved. In this work, we propose a new compact network architecture and unsupervised DA method. The DNN is built on a new basic module Conv-M that provides more diverse feature extractors without significantly increasing parameters. The unified framework of our DA method will simultaneously learn invariance across domains, reduce divergence of feature representations and adapt label prediction. Our DNN has 4.1M parameters-only 6.7% of AlexNet or 59% of GoogLeNet. Experiments show that our DNN obtains GoogLeNet-level accuracy both on classification and DA, and our DA method slightly outperforms previous competitive ones. Put all together, our DA strategy based on our DNN achieves stateof-the-art on sixteen of total eighteen DA tasks on popular Office-31 and Office-Caltech datasets.
In this manuscript, we present a new general family of optimal iterative methods for finding multiple roots of nonlinear equations with known multiplicity using weight functions. An extensive convergence analysis is presented to verify the optimal eighth order convergence of the new family. Some special cases of the family are also presented which require only three functions and one derivative evaluation at each iteration to reach optimal eighth order convergence. A variety of numerical test functions along with some real-world problems such as beam designing model and Van der Waals’ equation of state are presented to ensure that the newly developed family efficiently competes with the other existing methods. The dynamical analysis of the proposed methods is also presented to validate the theoretical results by using graphical tools, termed as the basins of attraction.
Machine learning model weights and activations are represented in full-precision during training. This leads to performance degradation in runtime when deployed on neural network accelerator (NNA) chips, which leverage highly parallelized fixed-point arithmetic to improve runtime memory and latency. In this work, we replicate the NNA operators during the training phase, accounting for the degradation due to low-precision inference on the NNA in back-propagation. Our proposed method efficiently emulates NNA operations, thus foregoing the need to transfer quantization error-prone data to the Central Processing Unit (CPU), ultimately reducing the user perceived latency (UPL). We apply our approach to Recurrent Neural Network-Transducer (RNN-T), an attractive architecture for on-device streaming speech recognition tasks. We train and evaluate models on 270K hours of English data and show a 5-7% improvement in engine latency while saving up to 10% relative degradation in WER.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.