Since being recently raised, curriculum learning (CL) and selfpaced learning (SPL) have attracted increasing attention due to its multiple successful applications. While currently the rationality of this learning regime is heuristically inspired by the cognitive principle of humans, there still isn't a sound theory to explain the intrinsic mechanism leading to its effectiveness, especially on some successful attempts on big/noise data. To address this issue, this paper presents some theoretical results for revealing the insights under this learning scheme. Specifically, we first formulate a new learning problem aiming to learn a proper classifier from samples generated from the training distribution which is deviated from the target distribution. Furthermore, we find that the CL/SPL regime provides a feasible solving strategy for this learning problem. Especially, by first introducing high-confidence/easy samples and gradually involving low-confidence/complex ones into learning, the CL/SPL process latently minimizes an upper bound of the expected risk under target distribution, purely using the data from the deviated training distribution. We further construct a new SPL learning algorithm based on random sampling, which better complies with our theory, and substantiate its effectiveness by experiments implemented on synthetic and real data.
Learning with l1 -regularizer has brought about a great deal of research in learning theory community. Previous known results for the learning with l1 -regularizer are based on the assumption that samples are independent and identically distributed (i.i.d.), and the best obtained learning rate for the l1 -regularization type algorithms is O(1/√m) , where m is the samples size. This paper goes beyond the classic i.i.d. framework and investigates the generalization performance of least square regression with l1 -regularizer ( l1 -LSR) based on uniformly ergodic Markov chain (u.e.M.c) samples. On the theoretical side, we prove that the learning rate of l1 -LSR for u.e.M.c samples l1 -LSR(M) is with the order of O(1/m) , which is faster than O(1/√m) for the i.i.d. counterpart. On the practical side, we propose an algorithm based on resampling scheme to generate u.e.M.c samples. We show that the proposed l1 -LSR(M) improves on the l1 -LSR(i.i.d.) in generalization error at the low cost of u.e.M.c resampling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.