To enable a reinforcement learning agent to acquire symbolical knowledge, we augment it with a high-level knowledge representation. This representation consists of ordinal conditional functions (OCF) which allow it to rank world models. By this means the agent is enabled to complement the self-organizing capabilities of the low-level reinforcement learning sub-system by reasoning capabilities of a high-level learning component. We briefly summarize the state-of-the-art method how new information is included into the OCF. To improve the emergence of plausible behavior, we then introduce a modification of this method. The viability of this modification is examined first, for the inclusion of conditional information with negated consequents and second, for the generalization of belief in the context of unobserved variables. Besides providing a theoretical justification for this modification, we also show the advantages of our approach in comparison to the state-ofthe-art method of revision in a reinforcement learning application.
NOTATION AND TERMINOLOGYA variable a can represent a value from its domain D a . Such a domain consists of discrete values. One such realization of a variable is called a literal. We write literals by denoting the variable as a subscript of its value (e.g., 3 a or t a ). A formula consists of literals and logical operators such as ∧, ∨, ⇒, etc. It is