We present two novel algorithms for learning formulas in Linear Temporal Logic (LTL) from examples. The first learning algorithm reduces the learning task to a series of satisfiability problems in propositional Boolean logic and produces a smallest LTL formula (in terms of the number of subformulas) that is consistent with the given data. Our second learning algorithm, on the other hand, combines the SAT-based learning algorithm with classical algorithms for learning decision trees. The result is a learning algorithm that scales to real-world scenarios with hundreds of examples, but can no longer guarantee to produce minimal consistent LTL formulas. We compare both learning algorithms and demonstrate their performance on a wide range of synthetic benchmarks. Additionally, we illustrate their usefulness on the task of understanding executions of a leader election protocol.
Incorporating high-level knowledge is an effective way to expedite reinforcement learning (RL), especially for complex tasks with sparse rewards. We investigate an RL problem where the high-level knowledge is in the form of reward machines, a type of Mealy machines that encode non-Markovian reward functions. We focus on a setting in which this knowledge is a priori not available to the learning agent. We develop an iterative algorithm that performs joint inference of reward machines and policies for RL (more specifically, q-learning). In each iteration, the algorithm maintains a hypothesis reward machine and a sample of RL episodes. It uses a separate q-function defined for each state of the current hypothesis reward machine to determine the policy and performs RL to update the q-functions. While performing RL, the algorithm updates the sample by adding RL episodes along which the obtained rewards are inconsistent with the rewards based on the current hypothesis reward machine. In the next iteration, the algorithm infers a new hypothesis reward machine from the updated sample. Based on an equivalence relation between states of reward machines, we transfer the q-functions between the hypothesis reward machines in consecutive iterations. We prove that the proposed algorithm converges almost surely to an optimal policy in the limit. The experiments show that learning high-level knowledge in the form of reward machines leads to fast convergence to optimal policies in RL, while the baseline RL methods fail to converge to optimal policies after a substantial number of training steps.
We study a class of reinforcement learning tasks in which the agent receives its reward for complex, temporally-extended behaviors sparsely. For such tasks, the problem is how to augment the state-space so as to make the reward function Markovian in an efficient way. While some existing solutions assume that the reward function is explicitly provided to the learning algorithm (e.g., in the form of a reward machine), the others learn the reward function from the interactions with the environment, assuming no prior knowledge provided by the user. In this paper, we generalize both approaches and enable the user to give advice to the agent, representing the user’s best knowledge about the reward function, potentially fragmented, partial, or even incorrect. We formalize advice as a set of DFAs and present a reinforcement learning algorithm that takes advantage of such advice, with optimal con- vergence guarantee. The experiments show that using well- chosen advice can reduce the number of training steps needed for convergence to optimal policy, and can decrease the computation time to learn the reward function by up to two orders of magnitude.
The goal of the research was to determine the biomass yield and fuel properties of ten different poplar clones. The research was conducted in an experimental plot established in Forest Administration Osijek, Forest Office Darda, in the spring of 2014. The layout of the plot consisted of three repetitions per clone with 40 plants per repetition in spacing 3x1 m. Based on the DBH distribution, in the early spring of 2018, one sample tree of an average DBH per repetition was selected, thus forming a sample of 30 trees. Average survival rate of the investigated trees after four vegetation periods was 74.54 ±13.85% ranging from 52.08% (Koreana) to 91.67% (SV885 and SV490). Average DBH of the sample trees was 8.2 ±1.9 cm, height 9.3 ±1.8 m and root collar diameter 10.7 ±1.9 cm. Moisture content in fresh state (just after the felling) ranged from 51.6% (Hybride 275) to 55.9% (SV885). Bark content averaged 18.4%, from 15.4% (Baldo) to 21.1% (V 609). Average nominal density of the sampled trees amounted to 383.5 ±35.9 kg/m3. Bark ash content was on average ten times higher (6.44 ±0.65%) than wood ash content (0.64 ±0.07%) resulting in average ash content of 1.7 ±0.1% (taking the bark content into account). The clone SV490 showed the highest biomass yield with 15.8 t/ha/year, while the lowest biomass yield was recorded for the clone Hybride 275 with 2.8 t/ha/year. High inter-clonal productivity variation stresses the importance of selection work to find the most appropriate clones with the highest productivity potential for the given area where the poplar SRC plantations are to be established. Due to high initial moisture content, if direct chipping harvesting systems are preferred, wood chips could be efficiently used in CHP (Combined Heat and Power) plants that operate on the principle of biomass gasification (where a gasifier is coupled to a gas engine to produce electric power and heat). In several CHP gasification plants operating in Croatia, wood chips with high initial moisture content (from traditional poplar plantations) are used as a feedstock that has to be pre-dried using the surplus heat. In this respect SRC poplar wood chips could make an ideal feedstock supplement.
Reward machines are an established tool for dealing with reinforcement learning problems in which rewards are sparse and depend on complex sequences of actions. However, existing algorithms for learning reward machines assume an overly idealized setting where rewards have to be free of noise. To overcome this practical limitation, we introduce a novel type of reward machines, called stochastic reward machines, and an algorithm for learning them. Our algorithm, based on constraint solving, learns minimal stochastic reward machines from the explorations of a reinforcement learning agent. This algorithm can easily be paired with existing reinforcement learning algorithms for reward machines and guarantees to converge to an optimal policy in the limit. We demonstrate the effectiveness of our algorithm in two case studies and show that it outperforms both existing methods and a naive approach for handling noisy reward functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.