Falsification and Future Performance

Balduzzi, David

doi:10.1007/978-3-642-44958-1_5

Cited by 4 publications

(6 citation statements)

References 18 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…More specifically, effective information is the mutual information following an experimenter intervening to set a system or part of a system to maximum entropy. It quantifies the number of YES/NO questions required to produce an output from an input, thus measuring the ‘work’ the system does in selecting that output [9].…”

Section: Introductionmentioning

confidence: 99%

Emergence as the conversion of information: a unifying theory

Varley¹,

Hoel²

2022

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

Is reduction always a good scientific strategy? The existence of the special sciences above physics suggests not. Previous research has shown that dimensionality reduction (macroscales) can increase the dependency between elements of a system (a phenomenon called ‘causal emergence’). Here, we provide an umbrella mathematical framework for emergence based on information conversion. We show evidence that coarse-graining can convert information from one ‘type’ to another. We demonstrate this using the well-understood mutual information measure applied to Boolean networks. Using partial information decomposition, the mutual information can be decomposed into redundant, unique and synergistic information atoms. Then by introducing a novel measure of the synergy bias of a given decomposition, we are able to show that the synergy component of a Boolean network’s mutual information can increase at macroscales. This can occur even when there is no difference in the total mutual information between a macroscale and its underlying microscale, proving information conversion. We relate this broad framework to previous work, compare it to other theories, and argue it complexifies any notion of universal reduction in the sciences, since such reduction would likely lead to a loss of synergistic information in scientific models. This article is part of the theme issue ‘Emergent phenomena in complex physical and socio-technical systems: from cells to societies’.

show abstract

Section: Introductionmentioning

confidence: 99%

Emergence as the conversion of information: a unifying theory

Varley¹,

Hoel²

2022

Phil. Trans. R. Soc. A.

View full text Add to dashboard Cite

show abstract

“…For example, the meaning of P blue (that) = "that is blue" is the subset v A simple extension of possible world semantics from propositions to arbitrary functions is as follows (Balduzzi, 2011): D 1 (semantics). Given function f: X → Y, the semantics or meaning of output y ∈ Y is the ordered pair of sets:…”

Section: Semantics and Representationsmentioning

confidence: 99%

Grammars for Games: A Gradient-Based, Game-Theoretic Framework for Optimization in Deep Learning

Balduzzi

2016

Front. Robot. AI

Self Cite

View full text Add to dashboard Cite

Deep learning is currently the subject of intensive study. However, fundamental concepts such as representations are not formally defined -researchers "know them when they see them" -and there is no common language for describing and analyzing algorithms. This essay proposes an abstract framework that identifies the essential features of current practice and may provide a foundation for future developments. The backbone of almost all deep learning algorithms is backpropagation, which is simply a gradient computation distributed over a neural network. The main ingredients of the framework are, thus, unsurprisingly: (i) game theory, to formalize distributed optimization; and (ii) communication protocols, to track the flow of zeroth and first-order information. The framework allows natural definitions of semantics (as the meaning encoded in functions), representations (as functions whose semantics is chosen to optimized a criterion), and grammars (as communication protocols equipped with first-order convergence guarantees). Much of the essay is spent discussing examples taken from the literature. The ultimate aim is to develop a graphical language for describing the structure of deep learning algorithms that backgrounds the details of the optimization procedure and foregrounds how the components interact. Inspiration is taken from probabilistic graphical models and factor graphs, which capture the essential structural features of multivariate distributions.

show abstract

“…We show this holds for the well-studied special case of empirical risk minimization (ERM). Results are taken from [4], which should be consulted for details.…”

Section: Learningmentioning

confidence: 99%

“…It is easy to show that ei(E F ,D , 0) = ℓ − V C F (D), where V C F (D) is the empirical VC-entropy [4,12]. It follows with high probability that…”

Section: Learningmentioning

confidence: 99%

“…On the basis of these conflicting observations, it has been argued that useful biases in the form of "generic mechanisms for representation" must be hardwired into cortex [2]. This note describes a useful bias that encourages cooperative learning which is both biologically plausible and rigorously justified [3][4][5][6][7][8][9].Let us outline the problem. Neurons learn inductively.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Regulating the information in spikes: a useful bias

Balduzzi

2012

Preprint

Self Cite

View full text Add to dashboard Cite

The bias/variance tradeoff is fundamental to learning: increasing a model's complexity can improve its fit on training data, but potentially worsens performance on future samples [1]. Remarkably, however, the human brain effortlessly handles a wide-range of complex pattern recognition tasks. On the basis of these conflicting observations, it has been argued that useful biases in the form of "generic mechanisms for representation" must be hardwired into cortex [2]. This note describes a useful bias that encourages cooperative learning which is both biologically plausible and rigorously justified [3][4][5][6][7][8][9].Let us outline the problem. Neurons learn inductively. They generalize from finite samples and encode estimates of future outcomes (for example, rewards) into their spiketrains [10]. Results from learning theory imply that generalizing successfully requires strong biases [1] or, in other words, specialization. Thus, at any given time some neurons' specialties are more relevant than others. Since most of the data neurons receive are other neurons' outputs, it is essential that neurons indicate which of their outputs encode high quality estimates. Downstream neurons should then be biased to specialize on these outputs.The proposed biasing mechanism is based on a constraint on the effective information, ei, generated by spikes, see Eq. (*) below. The motivation for using effective information comes from a connection to learning theory explained in §2. There, we show the ei generated by empirical risk minimization quantifies capacity: higher ei yields tighter generalization bounds.Sections §3 and §4 consider implications of the constraint in two cases: abstractly and for a concrete model. In both cases we find that imposing constraint (*) implies: (i) essentially all information is carried by spikes; (ii) spikes encode reward estimates and (iii) the higher the effective information, the better the guarantees on estimates.Although the proposal is inspired by cortical learning, the main ideas are information-theoretic, suggesting they may also apply to other examples of interacting populations of adaptive agents.

show abstract

Falsification and Future Performance

Cited by 4 publications

References 18 publications

Emergence as the conversion of information: a unifying theory

Emergence as the conversion of information: a unifying theory

Grammars for Games: A Gradient-Based, Game-Theoretic Framework for Optimization in Deep Learning

Regulating the information in spikes: a useful bias

Contact Info

Product

Resources

About