Dissecting Moneyball: Improving Classification Model Interpretability in Baseball Pitch Prediction

Hickey, Kevin; Zhou, Lina; Tao, Jie

doi:10.24251/hicss.2020.031

Cited by 2 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…to predict different aspects of baseball using sabermetrics. For instance, Lee et al and Hickey et al used ML models to predict a thrown pitch's outcome (2,3). Furthermore, another study by Bock used sabermetrics and ML models to predict pitchers' short-term and long-term efficacy on their particular teams (4).…”

Section: Articlementioning

confidence: 99%

Predicting baseball pitcher efficacy using physical pitch characteristics

Oberoi,

Saarinen

2024

J Emerg Invest

View full text Add to dashboard Cite

The efficacy of baseball pitchers can be predicted from prior pitching data using machine learning (ML) models. Previous ML studies relating to baseball have primarily involved predicting outcomes of baseball games and a thrown pitch. This paper is the first work that uses 16 game-independent features, which describe a pitcher’s set of thrown pitches, to predict pitcher efficacy metrics, like walks/hits allowed per inning (WHIP), batting average against (BAA), and fielding independent pitching (FIP). We hypothesized that these 16 “physical features,” measured by sensors, can explain greater than 50% of the variance while predicting pitcher efficacy. We applied neural network (NN) models to predict the efficacy metrics using all 16 features, while we used linear regression (LR) models to analyze the individual impact of each feature for predicting the efficacy metrics. We observed from the NN and LR models that the “ballFrequency” feature was the most impactful in predicting the WHIP for any pitcher. For the BAA and FIP metrics, the LR models showed that none of the features, including the pitch velocity and types of pitches thrown, were statistically significant; however, our NN model did improve the prediction of the BAA and FIP metrics. Based on our evaluations, the ML models could not prove our hypothesis, as the results accounted for less than 50% of the variance when predicting the pitcher efficacy metrics. Professional scouts can still use the results of our feature analysis to select better pitchers who have never played a game at the professional level.

show abstract

Section: Articlementioning

confidence: 99%

Predicting baseball pitcher efficacy using physical pitch characteristics

Oberoi,

Saarinen

2024

J Emerg Invest

View full text Add to dashboard Cite

show abstract

“…Interpretable machine learning techniques can be characterized in three dimensions: model‐specificity (specific vs. agnostic), generalizability/scope (local vs. global) (Rai, 2020), and stage of data generation (ranging from data to results), as shown in Figure 1. Obtaining model‐specific global explanations (see upper‐left quadrant in Figure 1) is the most developed subarea in interpretable machine learning (Hickey et al, 2020). For instance, feature importance scoring is widely employed to understand an individual feature's contribution to the model.…”

Section: The Kta Perspective Of Complex Machine Learning Modelsmentioning

confidence: 99%

“…To generate interpretations at a user-defined level (e.g., a few instances), one solution is to first enumerate the interpretations on all instances, and then aggregate them (Hickey et al, 2020). To improve the scalability of interpretation aggregation, we propose an approach that identifies the most indicative features in the following steps.…”

Section: Instance Aggregationmentioning

confidence: 99%