Ready for Kickoff: AWS & NFL’s New Special Teams Next Gen Stat
The newest advanced metric tackles the hidden dynamics of punt and kickoff returns. The NFL Next Gen Stats team is here to give us the inside scoop
We have all witnessed a returner getting tackled a nanosecond after receiving a punt or kickoff. Holding onto the ball, let alone gaining a chunk of yardage, is a huge win. And the odds of a return touchdown are, let’s say, on more of the miraculous side. Only 0.6% of NFL kickoffs (6 out of 1,013) and 0.3% of punts (3 out of 952) were returned for touchdowns during the 2022 regular season. But that’s exactly why it’s a thrill to see a returner beat the odds, weaving through wave after wave of improbability. It’s an art. An extreme outlier in reality. And it can sway the game in a blink of an eye.
Unless things go extremely right or extremely wrong, however, a blink of an eye is about how much attention special teams often gets. “The intricacies of this battle for ball control and field position remain hidden from advanced analysis,” says Mike Band of NFL Next Gen Stats. “Yet, kickoffs and punts make up roughly 1/5 of the game, and can often have a major impact on field position and game flow.”
To address this gap, AWS machine learning (ML) engineers and the NFL’s Next Gen Stats group have co-developed Expected Return Yards, the first-ever set of advanced stat models focused on kickoff and punt returns. Over the last five years, this partnership has cooked up analytics that dig deeper into the offensive and defensive sides of the ball. Now they are applying those learnings and ML techniques to special teams.
This is all part of a larger effort to help fans experience and understand every aspect of the game as it happens, says Band, “If we can tell stories about each component within the game, it allows us to tell whatever story is transpiring on the field. Not just for the sake of it, but to bring the fans closer to closer to the sideline.”
Expected Return Yards predicts the yardage a returner will gain if they field a kickoff or punt. Take a look at the above exmaples. Imagine freezing time right when a returner receives the ball. “This stat model estimates how many yards the returner is expected to gain once they receive the ball based on the x and y speed, acceleration, and orientation direction of every player on the field at that timestamp…” [those numbers are derived from player tracking data sent from chips in the players’ pads, by the way] “…and from that timestamp, the model makes an estimate of the probability distribution of how many yards that player would gain if they were to return the punt or kickoff.”
When the team started experimenting, they explored combining punt and kickoff data to train one single model. But, it performed poorly. It turns out that while punts and kickoffs use similar parameter sets, the nature of the data is more diverse. The player location on the field, proximity of defenders when the returner catches the ball, returner speed, real-time position in relation to one another, and acceleration of punts versus kickoffs were dissimilar enough that the model ran into problems, says Band,
“In order to generate the best distribution of predictions, you have to create a model specific to that situation. It's like comparing apples to only apples in a model and then you create a similar model to compare oranges to oranges.” So, the stat was split into two distinct models, which performed much better.
Another key challenge came when trying to generate a probability percentage of a return touchdown. Punts and kickoffs happen all the time in a game, but rarely result in scores. Only six kickoff returns were returned for touchdowns during the 2022 regular season, while only three punt returns resulted in touchdowns (out of approximately 1,000 returns each). In the typical ML modeling process, which involves algorithms helping a system automatically look for patterns in data to make vital decisions for themselves, these low numbers are considered outliers and are often devalued. This set up a crucial question for the Next Gen Stats team… With such a small data set, “how do you predict the outlier events in football like a return touchdown? Can we capture the true touchdown probability on a punt return or a kick return that aligns with reality?”
It took a lot of borrowing, experimenting, adapting—and yes, even ML models training other ML models—to find the best solution. Expected Return Yards began with the foundational architecture of existing expected yards stat models. Engineers then tinkered with different techniques to correctly model the unusual event of a return touchdown. They finally found a breakthrough using a novel ML method originally designed for time series forecasting called Spliced Binned-Pareto (SBP) distribution. In a nutshell, SBP models data in a way that accounts for rare events by extending both ends of a distribution. This same method could also be applied to account for extreme rainfall in flood predictions, where an uncommon event has a huge impact on the overall performance of the model.
Let’s say the expected return yards distribution for a play is between 3 and 15 yards. As you move past 15 yards and get closer and closer to the end zone, jagged blips in the data pop up from the probability of bigger yardage gains. This has to do with the pivotal moment a returner makes it past each wave of defenders, says Band. “It is what we imagined from the football perspective—that the probability of scoring a touchdown has to do with whether or not you got past a big group of defenders and then the possibility that you ran past another defender further down the field. The SBP method better captured those blips of possibility which led to better touchdown probability estimates above our baseline model.”
A relatively new process called transfer learning also played a role in crafting the stat. A small data set usually doesn’t result in a good performing model. The more examples you have to train the model on, the higher the accuracy tends to be. Transfer learning boosts performance by using a model trained on a task and reusing it to train another model with a similar task. Leaning into this method, the team taught and tuned the new Expected Return Yards models using other expected yardage models (rushing and yards after catch) already in production.
An interesting development from this process is that the student will become the teacher. Discoveries found with this new model will be used to refine and enhance the very models that trained them. “Our next venture is to apply this newer architecture to our existing expected rushing yards model and our existing expected yards after catch models. We can refine them and look at the potential biases in them. And in this case, we've better accounted for outlier outcomes—the probability of an outlier that's more in line with reality. And so now we have work to do to improve our offensive-focused models with our latest learnings. Every modeling venture we’ve gone through, success or failure, we’ve learned, can apply to our existing work. It’s a constant feedback loop.”
“There so many games within the game, so many insights to find, and so many stories left to be told.”
Mike Band, Next Gen Stats
So there you have it—a reliable prediction of expected punt or kickoff return yardage along with the probability of a touchdown. While exciting, this is just a sliver of the special teams story this stat can begin to tell. Which returners are the most consistent in creating yards? Is a punt returner too conservative or aggressive in signaling for a fair catch? Which gunners are most effective at limiting the space for a punt return? These are all insights provided in the new model that can peel back the layers of this largely ignored battle of position.
Expected Return Yards is the next step in giving fans fresh looks at everything unfolding on the field—and in the minds of coaches. Starting in 2018, The NFL and AWS have now engineered a suite of advanced analytics digging into every side of football. With each new model, understanding of machine learning and neural networks as it relates to the game grows. And as learning and technology grows, so does the confidence in the partnership to tackle never-seen-before stats. “It's led us to knowing that when we go into a project of such big magnitude, that we have a high probability of success, and then we have a high probability of coming out with a good and usable model.”
So, what can we expect next? The trajectory of AWS and Next Gen Stats is “really about bringing newer insights derived from newer tech throughout the entire fan experience,” says Band. “It’s about giving fans a live look at the heartbeat of the game and following the rollercoaster of what happens during any given play.” When it comes to the advanced sports analytics world, we’re only in the first seconds of the first quarter. Existing stat models will refine and grow. New models will reveal new stories on every side of the ball. Innovative applications of the stats will further transform the fan experience on and off the field. As the technology evolves, so too will our ability to analyze the game—second by second, or rather, millisecond by second, to be more exact.
Want a deeper dive into the data science behind Expected Return Yards? Check out this Q&A with AWS ML Solution Labs engineers.