NFL Tracking Data Project

Summary of Project Presented at CASSIS

In Spring 2024, I worked on a project using NFL player tracking data provided by the Kaggle NFL Big Data Bowl. Although I was late to the party and as a result didn’t get to submit anything, I did do some work that got valuable feedback from Professor Greg Matthews along the way.

I was inspired by the submission and subsequent paper by a CMU team of Quang Nguyen, Ruitong Jiang, Meg Ellingwood, and Ron Yurko on fractional tackles. A common measure for a defender’s ability is tackles, which are assigned at the end of plays if a defender is deemed to have tackled the ball carrier. If multiple players were involved, they are each assigned half of a tackle. However, this is a very subjective measure. To improve this, they propose a method for assigning fractional tackle credit that is model-free. At each frame, a contact window is calculated as the 1.5-yard bubble around the ball carried. The carrier’s momentum towards the endzone is what the defenders are credited with decreasing, and credit is distributed equally across frames of contact windows, and divvied up within a frame evenly between the defenders with a contact opportunity in the window.

To build on this, I wanted to score players based on yards and points saved rather than momentum decreased. Although it captures a similar notion, I think that putting it in terms of yards and points provides a more intuitive metric by which to evaluate players. To build intuition on the value of both of these metrics, see the GIF below. In week 2 of the 2022 season, the Jaguars thrashed the Colts 24-0. In the fourth quarter, Jonathan Taylor broke off a run for about 25 yards. In the play, number 26 is credited with making the tackle. However, the most valuable defender on the play was clearly number two, who functioned as a speed bump, slowing down Taylor enough for the Jaguars defenders to catch up and make a play. With typical tackle tallying metrics, 2 would not have gotten credit.

With this motivating example, we set off to value a defender’s contribution in terms more fitting. I’ve included the code for this project in a repo on my GitHub.

Valuing Offensive Players

Preparing the Data

The data tracks every player at 30 fps for every play of every game of the first nine weeks of the 2022 NFL season. The player’s position, orientation, and speed are all tracked. Only plays for which the rusher is a running back (RB) are considered, and handoffs are the only play considered. Additionally, plays with a penalty are removed. This still leaves 3.8 million frames to work with. Obviously these frames are not independent, but I believe there is still enough data to produce insights.

Estimating Yards Gained

The approach taken in this project is inspired by other work by Yurko/Nguyen and other CMU contributors, specifically their work on evaluating passing defense, which was presented at NESSIS in 2023 and subsequently published as a paper . At each frame, they estimate the density function for the yard line at the end of the play, conditional on player positions and velocities. To account for the inherent multi-modal nature of this distribution, they used a non-parametric random forest approach which fits trees to minimize conditional density optimization loss and generates predictions through the weighting of “nearby” points that share a leaf . Another benefit to using this tool is that it cuts through the dependent nature of the features to some extent. Within-play frames are highly dependent, but the bagging procedure makes it less likely that frames from the same play are selected. It is not perfect however. With the foundational estimation tool in hand, they build to estimate the quality of a defender by their ability to limit yards after catch (YAC). We will use the same foundational estimation tool but focus on different plays and different outcomes.

We are interested in estimating the yards that an RB will gain. More formally, let $Y$ be a random variable mapping each frame to the yards from the end zone at the end of the play. For each frame, we are interested in the conditional expectation of $Y$, conditional on features of the play relating to the RB position, distances between players, and RB speed. We will use the RFCDE method to estimate

\[\mathbb{E}[Y | \textbf{X}_1] = \sum_{y = 0}^{100} P(Y = y | \textbf{X}_1)y.\]

After some tuning, I ended up using only the speed of the RB and the distance from the closest eight defenders to the RB. Additionally, we actually fit multiple models through a leave-one-week-out crossfitting approach in which each week is left out of the training process, yielding nine models. When estimating the density for a play, we then always work with the model that left out the week in which the play being considered has occurred.

Estimating Instantaneous Value of RB

While the yards gained by an RB is useful, note that not all yards are created equal. For instance, in some cases a defender may decide to give up additional yards in order to increase the likelihood that they can prevent a first down. This is because the defender is really looking to save points rather than yards. Thus, in this section we will extend our model to evaluate defenders by points saved rather than yards saved.

We will be using pre-existing expected points models for this portion of the project, for which there has been much work done. While I won’t go into an extensive review, some important works include Romer’s analysis of the valuation of game states, in which he also discusses fourth down decision making . Burke extends this work by considering downs/distances other than first and ten, which were the easiest simply due to sample size. Further, Yurko et. al develop a multinomial logistic regression model for expected points . Black-box machine learning models such as XGBoost have also been used for expected points prediction . The latter two benefit from access to nicely formatted play-by-play data from nflscrapR, allowing them to train models on plays from all sorts of downs and distances.

The XGBoost model developed by Ben Baldwin is publicly available in the nflfastR package, and the one we use here is trained on similar data and features to theirs, with some slight tweaks.

Before diving into the modeling, I want to note that there are shortcomings common to all of these proposed methods. As detailed by Ryan Brill in an outstanding talk at NESSIS 2023, these models do/fail to (1) adjust for team quality, (2) data has selection bias, and (3) effective sample size is much smaller than it would appear. The first issue is difficult to solve because the features offered have complex interactions and nonlinearities. The second issue is a little more sneaky. In the tracking and play-by-play data, good teams end up running more plays, and on those plays scoring more points. Thus, estimates of the expected points models will overestimate the true expected points. Third, frames of the same play and plays on the same drive have the same outcomes in yards and points respectively. Thus, the observations demonstrate high autocorrelation. The diminishing of sample size and biasing of the data exacerbates the first issue of adjusting for team quality, making it more difficult. Ryan explain and demonstrates these issues in much greater detail in his work . He proposes an XGBoost model trained only on first down plays in that paper, and also discusses the application of a catalytic prior to generate synthetic data as a sort of prior . The effect is to pull the flexible XGBoost model towards the simpler one used to generate the synthetic data, say a multinomial logistic regression.

With all of that said, I will be sticking with an XGBoost model that is slightly altered from the one made publicly available by nflfastR.

Keeping features simple, process of constructing them

Ranking Defensive Players

Contact opportunities
Overall vs. rate scores
Tables and outputs