Matrix factorization methods are widely used for extracting latent factors for low rank matrix completion and rating prediction problems arising in recommender systems of on-line retailers. Most of the existing models are based on L2 fidelity (quadratic functions of factorization error). In this work, a coordinate descent (CD) method is developed for matrix factorization under L1 fidelity so that the related minimization is done one variable at a time and the factorization error is sparsely distributed. In low rank random matrix completion and rating prediction of MovieLens-100k datasets, the CDL1 method shows remarkable stability and accuracy under gross corruption of training (observation) data while the L2 fidelity based methods rapidly deteriorate. A closed form analytical solution is found for the one-dimensional L1-fidelity subproblem, and is used as a building block of CDL1 algorithm whose convergence is analyzed. The connection with the well-known convex method, the robust principal component analysis (RPCA), is made. A comparison with RPCA on recovering low rank Gaussian matrices under sparse and independent Gaussian noise shows that CDL1 maintains accuracy at much lower sampling ratios (from much fewer observed entries) than that for RPCA.
We propose and study a new projection formula for training binary weight convolutional neural networks. The projection formula measures the error in approximating a full precision (32 bit) vector by a 1-bit vector in the 1 norm instead of the standard 2 norm. The 1 projector is in closed analytical form and involves a median computation instead of an arithmatic average in the 2 projector. Experiments on 10 keywords classification show that the 1 (median) BinaryConnect (BC) method outperforms the regular BC, regardless of cold or warm start. The binary network trained by median BC and a recent blending technique reaches test accuracy 92.4 %, which is 1.1% lower than the full-precision network accuracy 93.5 %. On Android phone app, the trained binary network doubles the speed of full-precision network in spoken keywords recognition.
We study channel number reduction in combination with weight binarization (1-bit weight precision) to trim a convolutional neural network for a keyword spotting (classification) task. We adopt a group-wise splitting method based on the group Lasso penalty to achieve over 50 % channel sparsity while maintaining the network performance within 0.25 % accuracy loss. We show an effective three-stage procedure to balance accuracy and sparsity in network training.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.