Estimation of mutual information between random variables has become
crucial in a range of fields, from physics to neuroscience to finance.
Estimating information accurately over a wide range of conditions relies on the
development of flexible methods to describe statistical dependencies among
variables, without imposing potentially invalid assumptions on the data. Such
methods are needed in cases that lack prior knowledge of their statistical
properties and that have limited sample numbers. Here we propose a powerful and
generally applicable information estimator based on non-parametric copulas. This
estimator, called the non-parametric copula-based estimator (NPC), is tailored
to take into account detailed stochastic relationships in the data independently
of the data’s marginal distributions. The NPC estimator can be used both
for continuous and discrete numerical variables and thus provides a single
framework for the mutual information estimation of both continuous and discrete
data. By extensive validation on artificial samples drawn from various
statistical distributions, we found that the NPC estimator compares well against
commonly used alternatives. Unlike methods not based on copulas, it allows an
estimation of information that is robust to changes of the details of the
marginal distributions. Unlike parametric copula methods, it remains accurate
regardless of the precise form of the interactions between the variables. In
addition, the NPC estimator had accurate information estimates even at low
sample numbers, in comparison to alternative estimators. The NPC estimator
therefore provides a good balance between general applicability to arbitrarily
shaped statistical dependencies in the data and shows accurate and robust
performance when working with small sample sizes. We anticipate that the
non-parametric copula information estimator will be a powerful tool in
estimating mutual information between a broad range of data.