Due to the low scoring nature of football (soccer), shots are often used as a proxy to evaluate team and player performances. However, not all shots are created equally and their quality differs significantly depending on the situation. The aim of this study is to objectively quantify the quality of any given shot by introducing a so-called expected goals (xG) model. This model is validated statistically and with professional match analysts. The best performing model uses an extreme gradient boosting algorithm and is based on hand-crafted features from synchronized positional and event data of 105, 627 shots in the German Bundesliga. With a ranked probability score (RPS) of 0.197, it is more accurate than any previously published expected goals model. This approach allows us to assess team and player performances far more accurately than is possible with traditional metrics by focusing on process rather than results.
A possible objective in analyzing trajectories of multiple simultaneously moving objects, such as football players during a game, is to extract and understand the general patterns of coordinated movement in different classes of situations as they develop. For achieving this objective, we propose an approach that includes a combination of query techniques for flexible selection of episodes of situation development, a method for dynamic aggregation of data from selected groups of episodes, and a data structure for representing the aggregates that enables their exploration and use in further analysis. The aggregation, which is meant to abstract general movement patterns, involves construction of new time-homomorphic reference systems owing to iterative application of aggregation operators to a sequence of data selections. As similar patterns may occur at different spatial locations, we also propose constructing new spatial reference systems for aligning and matching movements irrespective of their absolute locations. The approach was tested in application to tracking data from two Bundesliga games of the 2018/2019 season. It enabled detection of interesting and meaningful general patterns of team behaviors in three classes of situations defined by football experts. The experts found the approach and the underlying concepts worth implementing in tools for football analysts.
This study explores the influence of corona-specific training and playing conditions - especially empty stadiums - on match performance, contact behavior, and home advantage in the Bundesliga (BL) and Bundesliga 2 (BL2). We analyzed the 2017/18, 2018/19, and 2019/20 seasons and compared matches in rounds 26–34 before shutdown with “ghost” matches after restart. Results show increased running activity for high intensity distance: (+ 6.1%) and total distance covered (+ 4.3%). In BL2 in particular there were also changes in tactical aspects of the game (time in last third: –6.3%, pressure on pass receiver: –8.6%, success of attacking duels: –7.9%, share of long passes completed: + 15.6%, outplayed opponents per pass: –14.7%). Contact time to other players (< 2 m distance) was 15:35 mins per match. After restart, contact was reduced, especially when the ball was not in the last third (–11.2%). Away wins increased by +44.2% in BL and the home-away difference in yellow cards changed in favor of the away team (+31.2%) in BL2. We conclude that empty stadiums have reduced home advantage and decreased referee bias when awarding yellow cards. Player behavior might have been affected by tactical demands and/or conscious or unconscious self-protection.
Detecting counterpressing is an important task for any professional match-analyst in football (soccer), but is being done exclusively manually by observing video footage. The purpose of this paper is not only to automatically identify this strategy, but also to derive metrics that support coaches with the analysis of transition situations. Additionally, we want to infer objective influence factors for its success and assess the validity of peer-created rules of thumb established in by practitioners. Based on a combination of positional and event data we detect counterpressing situations as a supervised machine learning task. Together, with professional match-analysis experts we discussed and consolidated a consistent definition, extracted 134 features and manually labeled more than 20, 000 defensive transition situations from 97 professional football matches. The extreme gradient boosting model—with an area under the curve of $$87.4\%$$ 87.4 % on the labeled test data—enabled us to judge how quickly teams can win the ball back with counterpressing strategies, how many shots they create or allow immediately afterwards and to determine what the most important success drivers are. We applied this automatic detection on all matches from six full seasons of the German Bundesliga and quantified the defensive and offensive consequences when applying counterpressing for each team. Automating the task saves analysts a tremendous amount of time, standardizes the otherwise subjective task, and allows to identify trends within larger data-sets. We present an effective way of how the detection and the lessons learned from this investigation are integrated effectively into common match-analysis processes.
Passes are by far football’s (soccer) most frequent event, yet surprisingly little meaningful research has been devoted to quantify them. With the increase in availability of so-called positional data, describing the positioning of players and ball at every moment of the game, our work aims to determine the difficulty of every pass by calculating its success probability based on its surrounding circumstances. As most experts will agree, not all passes are of equal difficulty, however, most traditional metrics count them as such. With our work we can quantify how well players can execute passes, assess their risk profile, and even compute completion probabilities for hypothetical passes by combining physical and machine learning models. Our model uses the first 0.4 seconds of a ball trajectory and the movement vectors of all players to predict the intended target of a pass with an accuracy of $$93.0\%$$ 93.0 % for successful and $$72.0\%$$ 72.0 % for unsuccessful passes much higher than any previously published work. Our extreme gradient boosting model can then quantify the likelihood of a successful pass completion towards the identified target with an area under the curve (AUC) of $$93.4\%$$ 93.4 % . Finally, we discuss several potential applications, like player scouting or evaluating pass decisions.
We study the automatic annotation of situations in soccer games. At first sight, this translates nicely into a standard supervised learning problem. However, in a fully supervised setting, predictive accuracies are supposed to correlate positively with the amount of labeled situations: more labeled training data simply promise better performance. Unfortunately, non-trivially annotated situations in soccer games are scarce, expensive and almost always require human experts; a fully supervised approach appears infeasible. Hence, we split the problem into two parts and learn (i) a meaningful feature representation using variational autoencoders on unlabeled data at large scales and (ii) a large-margin classifier acting in this feature space but utilize only a few (manually) annotated examples of the situation of interest. We propose four different architectures of the variational autoencoder and empirically study the detection of corner kicks, crosses and counterattacks. We observe high predictive accuracies above 90% AUC irrespectively of the task.
Choosing the right formation is one of the coach’s most important decisions in football. Teams change formation dynamically throughout matches to achieve their immediate objective: to retain possession, progress the ball up-field and create (or prevent) goal-scoring opportunities. In this work we identify the unique formations used by teams in distinct phases of play in a large sample of tracking data. This we achieve in two steps: first, we trained a convolutional neural network to decompose each game into non-overlapping segments and classify these segments into phases with an average F 1-score of 0.76. We then measure and contextualize unique formations used in each distinct phase of play. While conventional discussion tends to reduce team formations over an entire match to a single three-digit code (e.g. 4-4-2; 4 defender, 4 midfielder, 2 striker), we provide an objective representation of teams formations per phase of play. Using the most frequently occurring phases of play, mid-block, we identify and contextualize six unique formations. A long-term analysis in the German Bundesliga allows us to quantify the efficiency of each formation, and to present a helpful scouting tool to identify how well a coach’s preferred playing style is suited to a potential club.
The global SARS-CoV-2 pandemic led to a lockdown in team sports in March 2020. Because the risk of virus transmission seems to correlate with the duration of close contacts, data on contact times are necessary to assess the risk of virus transmission in sports. In this study, an optical tracking system was used to determine contact times between players of the two highest men's professional football leagues in Germany in the 2019-20 season and in the first half of the 2020-21 season. Contacts between players were defined as being within a two-metre radius during matches and were differentiated as either match-specific or non-match-specific. In total, 918 matches with 197,087 contacts were analysed. The mean overall contact time of one-to-one situations of 36 s (SD: ± 66) before the lockdown was reduced to 30 s after the lockdown (SD: ± 60) (p < 0.0001). In professional football, contacts between two players infrequently occur within a two-metre radius, averaging less than 35 s. Only 36 player pair contacts lasted for more than 15 min (0.00018%). The mean accumulated contact time per player with all others was 10.6 ± 6.9 min per match, with a decrease from 11.6 ± 7.0 min before the lockdown to 10.0 ± 6.6 min (p < 0.0001) after lockdown in the season 2019-20. The SARS-CoV-2 pandemic has resulted in a reduction in match-specific contacts of 25%. It seems questionable if such short contacts in open-air sports may lead to considerable virus transmission.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.