“…For the present framework we generate a comprehensive repository of AI benchmarks(Martínez- Plumed et al, 2020a,b) based on our previous compilation, analysis and annotation of AI papers and benchmarking results(Hernández-Orallo, 2017a;Martínez-Plumed et al, 2018;Martinez- Plumed and Hernandez-Orallo, 2018; Martínez-Plumed et al, 2020a,b) as well as open resources such as Papers With Code 16 (the largest, up to date, free and open repository of machine learning code and results), which includes data from several AI-related repositories (e.g., EFF 17 , NLPprogress 18 , SQuAD 19 , RedditSota 20 , etc.). All these repositories draw on data from multiple (verified) sources, including academic literature, review articles and code platforms focused on machine learning and AI.For the purposes of this study, from the aforementioned sources we track the reported evaluation results (when available or sufficient data is provided) on different metrics of AI performance across separate AI benchmarks (e.g., datasets, competitions, challenges, awards, etc.)…”