In the last few years, Artificial Intelligence (AI) has reached the public consciousness through high-profile applications such as chatbots, image generators, speech synthesis and transcription. These are all due to the success of deep learning: Machine learning algorithms that learn tasks from massive amounts of data. The neural network models used in deep learning involve many parameters, often in the order of billions. These models often fail on tasks that computers are traditionally very good at, like calculating arithmetic expressions, reasoning about many different pieces of information, planning and scheduling complex systems, and retrieving information from a database. These tasks are traditionally solved using symbolic methods in AI based on logic and formal reasoning.
Neurosymbolic AI instead aims to integrate deep learning with symbolic AI. This integration has many promises, such as decreasing the amount of data required to train the neural networks, improving the explainability and interpretability of answers given by models and verifying the correctness of trained systems. We mainly study neurosymbolic learning, where we have, in addition to data, background knowledge expressed using symbolic languages. How do we connect the symbolic and neural components to communicate this knowledge to the neural networks?
We consider two answers: Fuzzy and probabilistic reasoning. Fuzzy reasoning studies degrees of truth. A person can be very or somewhat tall: Tallness is not a binary concept. Instead, probabilistic reasoning studies the probability that something is true or will happen. A coin has a 0.5 probability of landing heads. We never say it landed on "somewhat heads". What happens when we use fuzzy (part I) or probabilistic (part II) approaches to neurosymbolic learning? Moreover, do these approaches use the background knowledge we expect them to?
Our first research question studies how different forms of fuzzy reasoning combine with learning. We find surprising results like a connection to the Raven paradox, which states that we confirm "ravens are black" when we observe a green apple. In this study, we gave our neural network a training objective created from the background knowledge. However, we did not use the background knowledge when we deployed our models after training. In our second research question, we studied how to use background knowledge in deployed models. To this end, we developed a new neural network layer based on fuzzy reasoning.
The remaining research questions study probabilistic approaches to neurosymbolic learning. Probabilistic reasoning is a natural fit for neural networks, which we usually train to be probabilistic. However, probabilistic approaches come at a cost: They are expensive to compute and do not scale well to large tasks. In our third research question, we study how to connect probabilistic reasoning with neural networks by sampling to estimate averages. Sampling circumvents computing reasoning outcomes for all input combinations. In the fourth and final research question, we study scaling probabilistic neurosymbolic learning to much larger problems than possible before. Our insight is to train a neural network to predict the result of probabilistic reasoning. We perform this training process with just the background knowledge: We do not collect data.
How is this related to optimisation? All research questions are related to optimisation problems. Within neurosymbolic learning, optimisation with popular methods like gradient descent undertake a form of reasoning. There is ample opportunity to study how this optimisation perspective improves our neurosymbolic learning methods. We hope this dissertation provides some of the answers needed to make practical neurosymbolic learning a reality: Where practitioners provide both data and knowledge that the neurosymbolic learning methods use as efficiently as possible to train the next generation of neural networks.