A mutual information based upper bound on the generalization error of a supervised learning algorithm is derived in this paper. The bound is constructed in terms of the mutual information between each individual training sample and the output of the learning algorithm, which requires weaker conditions on the loss function, but provides a tighter characterization of the generalization error than existing studies. Examples are further provided to demonstrate that the bound derived in this paper is tighter, and has a broader range of applicability. Application to noisy and iterative algorithms, e.g., stochastic gradient Langevin dynamics (SGLD), is also studied, where the constructed bound provides a tighter characterization of the generalization error than existing results.
The problem of estimating the Kullback-Leibler divergence D(P Q) between two unknown distributions P and Q is studied, under the assumption that the alphabet size k of the distributions can scale to infinity. The estimation is based on m independent samples drawn from P and n independent samples drawn from Q. It is first shown that there does not exist any consistent estimator that guarantees asymptotically small worstcase quadratic risk over the set of all pairs of distributions. A restricted set that contains pairs of distributions, with density ratio bounded by a function f (k) is further considered. An augmented plug-in estimator is proposed, and its worst-case quadratic risk is shown to be within a constant factor of (n , if m and n exceed a constant factor of k and kf (k), respectively. Moreover, the minimax quadratic risk is characterized to be within a constant factor of ( k m log k + kf (k)n , if m and n exceed a constant factor of k/ log(k) and kf (k)/ log k, respectively. The lower bound on the minimax quadratic risk is characterized by employing a generalized Le Cam's method. A minimax optimal estimator is then constructed by employing both the polynomial approximation and the plug-in approaches.
The shocks that underlie China's comparatively rapid growth include gains in productivity, factor accumulation and policy reforms that increase allocative efficiency. The well-known Balassa-Samuelson hypothesis links productivity growth in tradable industries with real appreciations. Yet it relies heavily on the law of one price applying for tradable goods, against which there is now considerable evidence. In its absence, other growth shocks also affect the real exchange rate by influencing relative supply or demand for home product varieties. This paper investigates the preconditions for the Balassa-Samuelson hypothesis to predict a real appreciation in the Chinese case. It then quantifies the links between all growth shocks and the Chinese real exchange rate using a dynamic model of the global economy with open capital accounts and full demographic underpinnings to labour supply. The results suggest that financial capital inflows most affect the real exchange rate in the short term, while differential productivity is strong in the medium term. Contrary to expectation, in the long term demographic forces prove to be weak relative to changes in the skill composition of the labour force which enhance services sector performance and depreciate the real exchange rate.
Aspiculuris tianjinensis sp. nov. recovered from the intestine of Clethrionomys rufocanus from Tianjin, China is described and illustrated using light microscopy and scanning electron microscopy. The new species differs from congeners in the shape of the cervical alae, and the number and arrangement of caudal papillae.
The problem of universal outlying sequence detection is studied, where the goal is to detect outlying sequences among M sequences of samples. A sequence is considered as outlying if the observations therein are generated by a distribution different from those generating the observations in the majority of the sequences. In the universal setting, we are interested in identifying all the outlying sequences without knowing the underlying generating distributions. In this paper, a class of tests based on distribution clustering is proposed. These tests are shown to be exponentially consistent with linear time complexity in M . Numerical results demonstrate that our clustering-based tests achieve similar performance to existing tests, while being considerably more computationally efficient.
A framework previously introduced in [3] for solving a sequence of stochastic optimization problems with bounded changes in the minimizers is extended and applied to machine learning problems such as regression and classification. The stochastic optimization problems arising in these machine learning problems is solved using algorithms such as stochastic gradient descent (SGD). A method based on estimates of the change in the minimizers and properties of the optimization algorithm is introduced for adaptively selecting the number of samples at each time step to ensure that the excess risk, i.e., the expected gap between the loss achieved by the approximate minimizer produced by the optimization algorithm and the exact minimizer, does not exceed a target level. A bound is developed to show that the estimate of the change in the minimizers is nontrivial provided that the excess risk is small enough. Extensions relevant to the machine learning setting are considered, including a cost-based approach to select the number of samples with a cost budget over a fixed horizon, and an approach to applying cross-validation for model selection. Finally, experiments with synthetic and real data are used to validate the algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.