Stochastic Tasks: Difficulty and Levin Search

Hernández-Orallo, José

doi:10.1007/978-3-319-21365-1_10

Cited by 3 publications

(4 citation statements)

References 11 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The key idea was defining intelligence test items using algorithmic information theory (Hernández-Orallo and Minaya-Collado, 1998), an approach that was followed by many other proposals in the next two decades, from the very influential "universal intelligence" (Legg and Hutter, 2007) to the recent "measure of intelligence" (Chollet, 2019). However, while some of these proposals have had an important impact on the understanding of what intelligence is, its relation to compression (Dowe et al, 2011), difficulty (Hernández-Orallo, 2015Hernandez-Orallo, 2015) and generality (Martinez-Plumed and Hernandez-Orallo, 2018), the adoption of some of these tests (or associated definitions) in practice has been very limited.…”

Section: Anymentioning

confidence: 99%

“…Whereas the development of measurement instruments that follow the adversarial testing is still incipient, and has not progressed significantly since (Hernández-Orallo and Dowe, 2010;, it adapts according to one or more dimensions, as per the transitional and universal cases in Figure 2. Assuming each dimension is defined by a difficulty metric (Mishra et al, 2013;Hernandez-Orallo, 2015;Martinez-Plumed and Hernandez-Orallo, 2018;Martínez-Plumed et al, 2019;Hernández-Orallo, 2020), we have a multidimensional space for which the adversarial testing can derive the location of the testee in this space. By doing this, similarities and clustering are calculated in this space, with no need of exploring all the n¢pn¡1q 2 combinations when n agents are being analysed.…”

Section: Building Behavioural Taxonomiesmentioning

confidence: 99%

See 1 more Smart Citation

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

Hernández-Orallo

2020

Minds & Machines

Self Cite

View full text Add to dashboard Cite

In the last twenty years the Turing test has been left further behind by new developments in artificial intelligence. At the same time, however, these developments have revived some key elements of the Turing test: imitation and adversarialness. On the one hand, many generative models, such as generative adversarial networks (GAN), build imitators under an adversarial setting that strongly resembles the Turing test (with the judge being a learnt discriminative model). The term "Turing learning" has been used for this kind of setting. On the other hand, AI benchmarks are suffering an adversarial situation too, with a 'challenge-solve-and-replace' evaluation dynamics whenever human performance is 'imitated'. The particular AI community rushes to replace the old benchmark by a more challenging benchmark, one for which human performance would still be beyond AI. These two phenomena related to the Turing test are sufficiently distinctive, important and general for a detailed analysis. This is the main goal of this paper. After recognising the abyss that appears beyond superhuman performance, we build on Turing learning to identify two different evaluation schemas: Turing testing and adversarial testing. We revisit some of the key questions surrounding the Turing test, such as 'understanding', commonsense reasoning and extracting meaning from the world, and explore how the new testing paradigms should work to unmask the limitations of current and future AI. Finally, we discuss how behavioural similarity metrics could be used to create taxonomies for artificial and natural intelligence. Both testing schemas should complete a transition in which humans should give way to machines -not only as references to be imitated but also as judges-when pursuing and measuring machine intelligence.

show abstract

Section: Anymentioning

confidence: 99%

Section: Building Behavioural Taxonomiesmentioning

confidence: 99%

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

Hernández-Orallo

2020

Minds & Machines

Self Cite

View full text Add to dashboard Cite

show abstract

“…We evaluate the more able agents with more difficult tasks. In order to do this, we calculate difficulty of a task as the complexity of the simplest policy that is successful for the task (Hernández-Orallo, 2015b). Complexity/simplicity is measured as a combination of the size of the policy and its execution time.…”

Section: Analysis Of Subpopulations Binned By Abilities and Difficultiesmentioning

confidence: 99%

AI Generality and Spearman’s Law of Diminishing Returns

Hernández-Orallo¹

2019

jair

View full text Add to dashboard Cite

Many areas of AI today use benchmarks and competitions with larger and wider sets of tasks. This tries to deter AI systems (and research effort) from specialising to a single task, and encourage them to be prepared to solve previously unseen tasks. It is unclear, however, whether the methods with best performance are actually those that are most general and, in perspective, whether the trend moves towards more general AI systems. This question has a striking similarity with the analysis of the so-called positive manifold and general factors in the area of human intelligence. In this paper, we first show how the existence of a manifold (positive average pairwise task correlation) can also be analysed in AI, and how this relates to the notion of agent generality, from the individual and the populational points of view. From the populational perspective, we analyse the following question: is this manifold correlation higher for the most or for the least able group of agents? We contrast this analysis with one of the most controversial issues in human intelligence research, the so-called Spearman's Law of Diminishing Returns (SLODR), which basically states that the relevance of a general factor diminishes for most able human groups. We perform two empirical studies on these issues in AI. We analyse the results of the 2015 general video game AI (GVGAI) competition, with games as tasks and "controllers" as agents, and the results of a synthetic setting, with modified elementary cellular automata (ECA) rules as tasks and simple interactive programs as agents. In both cases, we see that SLODR doesnot appear. The data, and the use of just two scenarios, does not clearly support the reverse either, a Universal Law of Augmenting Returns (ULOAR), but calls for more experiments on this question.

show abstract

“…general (video) game playing of a handful of 1 Classical planning has hierarchical task networks [4], but subtask decomposition is almost always done manually and there is no real analysis of tasks on a general level. Some people working on AI evaluation -one of task theory's primary applications -attempt to analyze some properties of task-environments, but they don't go beyond complexity and difficultyrelated measures [6].…”

Section: Introductionmentioning

confidence: 99%

Why Artificial Intelligence Needs a Task Theory

Þórisson

Bieger

Thorarensen

et al. 2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The concept of "task" is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane's performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial general intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A task theory would enable addressing tasks at the class level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks.

show abstract

Stochastic Tasks: Difficulty and Levin Search

Cited by 3 publications

References 11 publications

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

Twenty Years Beyond the Turing Test: Moving Beyond the Human Judges Too

AI Generality and Spearman’s Law of Diminishing Returns

Why Artificial Intelligence Needs a Task Theory

Contact Info

Product

Resources

About