The biological plausibility of the backpropagation algorithm has long been doubted by neuroscientists. Two major reasons are that neurons would need to send two different types of signal in the forward and backward phases, and that pairs of neurons would need to communicate through symmetric bidirectional connections. We present a simple two-phase learning procedure for fixed point recurrent networks that addresses both these issues. In our model, neurons perform leaky integration and synaptic weights are updated through a local mechanism. Our learning method generalizes Equilibrium Propagation to vector field dynamics, relaxing the requirement of an energy function. As a consequence of this generalization, the algorithm does not compute the true gradient of the objective function, but rather approximates it at a precision which is proven to be directly related to the degree of symmetry of the feedforward and feedback weights. We show experimentally that our algorithm optimizes the objective function.
Three-dimensional printing (3DP) of thermoplastic polyurethane (TPU) is gaining interest in the medical industry thanks to the combination of tunable properties that TPU exhibits and the possibilities that 3DP processes offer concerning precision, time, and cost of fabrication. We investigated the implementation of a medical grade TPU by fused deposition modelling (FDM) for the manufacturing of an implantable medical device from the raw pellets to the gamma (γ) sterilized 3DP constructs. To the authors’ knowledge, there is no such guide/study implicating TPU, FDM 3D-printing and gamma sterilization. Thermal properties analyzed by differential scanning calorimetry (DSC) and molecular weights measured by size exclusion chromatography (SEC) were used as monitoring indicators through the fabrication process. After gamma sterilization, surface chemistry was assessed by water contact angle (WCA) measurement and infrared spectroscopy (ATR-FTIR). Mechanical properties were investigated by tensile testing. Biocompatibility was assessed by means of cytotoxicity (ISO 10993-5) and hemocompatibility assays (ISO 10993-4). Results showed that TPU underwent degradation through the fabrication process as both the number-averaged (Mn) and weight-averaged (Mw) molecular weights decreased (7% Mn loss, 30% Mw loss, p < 0.05). After gamma sterilization, Mw increased by 8% (p < 0.05) indicating that crosslinking may have occurred. However, tensile properties were not impacted by irradiation. Cytotoxicity (ISO 10993-5) and hemocompatibility (ISO 10993-4) assessments after sterilization showed vitality of cells (132% ± 3%, p < 0.05) and no red blood cell lysis. We concluded that gamma sterilization does not highly impact TPU regarding our application. Our study demonstrates the processability of TPU by FDM followed by gamma sterilization and can be used as a guide for the preliminary evaluation of a polymeric raw material in the manufacturing of a blood contacting implantable medical device.
Credit assignment in reinforcement learning is the problem of measuring an action's influence on future rewards. In particular, this requires separating skill from luck, ie. disentangling the effect of an action on rewards from that of external factors and subsequent actions. To achieve this, we adapt the notion of counterfactuals from causality theory to a model-free RL setup. The key idea is to condition value functions on future events, by learning to extract relevant information from a trajectory. We then propose to use these as future-conditional baselines and critics in policy gradient algorithms and we develop a valid, practical variant with provably lower variance, while achieving unbiasedness by constraining the hindsight information not to contain information about the agent's actions. We demonstrate the efficacy and validity of our algorithm on a number of illustrative problems.
We consider the problem of efficient credit assignment in reinforcement learning. In order to efficiently and meaningfully utilize new data, we propose to explicitly assign credit to past decisions based on the likelihood of them having led to the observed outcome. This approach uses new information in hindsight, rather than employing foresight. Somewhat surprisingly, we show that value functions can be rewritten through this lens, yielding a new family of algorithms. We study the properties of these algorithms, and empirically show that they successfully address important credit assignment challenges, through a set of illustrative tasks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.