Football Analytics – Way Over-Hyped

Sports decision-making.  Once dominated by coaching judgments, but now increasingly the captive of mathematicians.  Teams of self-styled math gurus scrub data of every conceivable sort searching for heretofore obscured “insights”.  Subjective decision-making replaced by rules-based decision making. The goal of analytics?  Extract insights from prior events to guide future decisions and increase the probability of desired future outcomes.  A tool that, where “insights” are generated, is assumed will enhance outcome-probabilities.  Mathematical analysis will improve outcomes.  But is this assumption in sports, particularly football, correct?  How useful are football analytics?

Are Football Analytics Productive?

It’s all about the number of variables and their relevance to calculating probabilities of outcomes of future decisions.  Baseball may be reasonably suited for mathematically analyzing prior events to increase probabilities of future events.  Many baseball decisions involve choices to account for a few, well-defined variables.  But that’s simply not the case in football.

New and creative ways of football statistical analysis are in vogue.  Websites such as Pro Football Focus provide fabulous analytical detail, much of it self-created, going where football analysis has never gone before.  https://www.pff.com/nfl  It’s football analytics on steroids.

But mathematical analyses that produce numerical outcomes create a false sense of decision-making certainty.  Humans have an instinctive need for certainty, especially with respect to future possibilities.  Mathematical analytics have the luxury of “quantifying” outcomes.  It associates a “number” with a choice.  It fills an emotional need to make the unknowable future more certain.  But is it really an effective aide to football decision-making?

In football the argument for using analytics that incorporate a multitude of factors is poorly conceived and misunderstood.  We argue that with football the utility of analytics-based judgments is suspect.  It may even be detrimental in most instances to reaching better probability judgments.   Here’s why.

Football and Binary Choices

All sports analytics are generated from backward-dependent variables.  Those variables consist of both independent variables and dependent variables.  Analytics proponents believe that the data output from these backward-dependent variables makes the calculation of future probabilities more accurate.  But the value of that statistical output to enhancing probabilities, and thereby the likely outcome of a future scenario, is dependent on four things.  First, the quality of the variables from which that output is derived.  Second, the relevance of that data output to a future scenario.  Third, the breadth of the data set.  And fourth, human judgments, which are inherently subjective.

The Field Goal Example

In football, basic analytics may aide in making decisions concerning simple binary choices, such as whether to kick a field goal or to go for a two-point conversion.  In instances involving these simple binary choices there are numerically few variables and those variables are well defined.  The relevance of the data is very high.  The breadth of the data set is usually adequate if not deep.  And there is little to no application of human judgment in selecting the data or its degree of relevancy.

Consider a decision whether to attempt a field goal.  The single most important variable is player-dependent: the skill of the place kicker.  There is one primary data set.  All prior field goal attempts by that kicker.  Subsets of data are distilled from the primary data set.

Field Goal Data

There are several distinct data subsets, and we list a few.  Distance of each kick attempt.  Stadium in which the attempt occurred.  Point in the game at which each attempt occurred.  Wind conditions for each kick.  Any trend regarding recent kick attempts by that  kicker at the relevant distance.

Several other variables are involved in making the kick decision.  Many of those variables are unrelated to the players on the field.  These include time left in the game, current score, and wind and field conditions.  Player variables, other than the kicker, include the skill of the long snapper and the holder.  Those two factors are typically assumed away unless the player in either position is a substitute.

Lastly, there are two additional game-specific factors important to the play decision.  First, the opponent in the game.  Second, the ability of the kicking team to generate subsequent scoring opportunities.  Other minor considerations can be identified.

Few would argue that the most important variable in this decision matrix is the skill of the kicker.  The simple kicking data list (above) is likely no different than what has been used throughout football history.  This is not difficult stuff.  The other player-dependent variables are typically not relevant to the kicking decision.  The non-player dependent variables, several of which are readily susceptible to mathematical calculation (score of the game, time remaining, etc.), are easily placed in a decision matrix tree.

The binary choice – attempt a field goal or not – is an easy one.  The major considerations (the non-player dependent variables) are easily applied to enhance outcome probabilities.  There is only one key player-dependent variable – the identity of the kicker.

Most Football Choices – Too Many Variables

By contrast, the selection of plays in most football situations does not involve simple binary choices.  Almost all plays involve a multiple number of choices.  Because there are so many choices, the utility of output analytics from prior plays, as a tool for increasing output probabilities, is suspect.

Player Variables

Let’s start with player variables.  In football, each play involves 22 athletes with a variety of skill sets.  These variables move on the playing field at varying speeds and angles.  These 22 independent player variables consist of 11 on offense and 11 on defense.  Typically, an NFL roster contains between 24 and 26 players on both offense and defense.

The probability of a specific offensive player being on the field for a play is not equal for each player.  Substitutions for the five offensive linemen and the quarterback are generally unusual.  The probability of each of those six players being on the field for a play is high.  And in that combination.  But probabilities decline for the receivers, tight ends and running backs.

The probabilities for individual defensive players being present for a play are more difficult to calculate.  Generally said, starting cornerbacks and safeties may have high frequency of play appearances.  It is generally less so for defensive linemen and linebackers.  On defense the rotation of players based upon down and distance parameters is prominent.

The offensive coach’s decision is wholly within his control.  He simply selects which 11 people will be on the field for the given play.  He may select a play that can be run with different player groupings.  To that extent, having selected the play, he may also have to select which grouping to use.  But the player selection choice starts with him.

The defensive coach is in a different position.  He typically selects his comportment of 11 players after observing who takes the field for the offense.  His decision is therefore reactionary to (and dependent upon) the personnel choice of his opponent at the time of the given play.  His decision incorporates a level of variable-dependency.

Because of these realities, the certainty of any specific group of 11 players being on the field for any given play, particularly on defense, cannot be determined with great precision.  The number of player group combinations is very large.  But for the sake of simplicity, we will assume there are “only” 22 individual player variables.

More Variables

Other variables affect play selection decisions.  For example, field conditions, wind conditions, the physical condition of each player, the fatigue of each player, lighting conditions, current score and time of the game, and performance in the current game, among others, are relevant variables.

There are many offensive formation choices involved for any play.  These decisions are dependent on two primary variables: down and distance to gain.  On defensive, there are similarly numerable possible formations on any given play.  These decisions are dependent on three variables: down, distance to gain, and offensive personnel deployed.

The aggregate number of relevant variables to consider in choosing any given play are so large as to render the output from their interplay as non-instructive to future, similar down and distance probabilities.  Let’s get more specific.

Football Analytics in Real-Time

Assume Baltimore is playing Kansas City on a very cold and wind-blown day.  Rain is falling intermittently throughout the game.  The football is generally wet throughout the game.  There are nine minutes left in the fourth quarter.  Kansas City is leading 28-21.  After a slow start, the Chiefs have scored touchdowns on each of their last three possessions.  Those touchdowns resulted from drives of 75, 78 and 85 yards.  The Ravens have been unable to stop the Chiefs passing attack.  Baltimore’s defense is “gassed.”

The Ravens are facing a fourth down and three yards to gain on the Kansas City 37-yard line.  The wind is blowing at Baltimore’s back.  Baltimore initially has a situational decision.  Should it elect to attempt a 54-yard field goal?  Or, should it attempt to gain a first down?

Field Goal Decision: Completely Game Dependent

Baltimore has the best kicker in NFL history, Justin Tucker.  Tucker has made his last 10 field goal attempts over 50 yards.  He has made 18 of his last 20 tries over  50 yards.  Four of those 20 kicks were in similar field and wind conditions and he made all four.

Baltimore’s kicking decision is almost entirely situationally game-dependent.  Field goal “analytics” are straightforward.  Football analyticsDeep analytics provide no “extra” insight regarding Tucker’s probability of success on a 54-yard field goal attempt.  His probability of success, given his past and most recent kick history at this distance and in these conditions is very high; indeed, that probability is the highest of any player in the league based on the identified key variables.

But the game-situational variables impact Baltimore’s coach’s decision.  Analytics of prior outcomes are not relevant to the key game-situational variable for Baltimore.  On this day the Ravens have been unable to stop the Chiefs passing attack.

Baltimore’s Coach Harbaugh judges the probability of the Ravens defense stopping the Chiefs is low.  He therefore believes the Ravens must score a touchdown now.  He cannot give the ball back to Kansas City with a four-point deficit (assuming Tucker makes the kick).  If the Chiefs score another touchdown, Baltimore will need to score twice more.  In that case, Coach Harbaugh believes there might not be time enough left to do so.  He also concludes he must try to rest his tired defense.  Thus, Baltimore eschews the field goal attempt.  For the same reasons, Baltimore never considers whether it should punt.  Complicated analytics provide no extra insight.

Football Analytics and Relevancy:  Data Selection

Football analytics

The Ravens focus on which play to run to gain the first down.  There are a very large number of variables to consider.  Baltimore’s key running back, Mark Ingram, who has gained 82 yards on 13 prior carries during the game, limped off the field two plays ago after twisting his knee, but comes back on the field for this play.  He is not, for this play, at full speed.  Marshall Yanda, Baltimore’s best interior offensive lineman, didn’t suit up.

To this point in the game, Baltimore has gained over 160 yards running the ball.  They have 23 rushing attempts, averaging almost eight yards per attempt.  The Chiefs have stopped only four running plays for three yards or less.  Quarterback Lamar Jackson has had trouble gripping the wet football and has completed only 9 of 19 passes, none of which traveled more than 12 yards in the air, with most of his errant throws sailing high.

There have been four third down and three yards or less to gain for Baltimore in this game.  Kansas City had different personnel on the field for three of those third downs with different formations on all three plays.

Earlier in this game, the Ravens attempted a fourth down and four yards to gain play.  The Ravens lined up in the pistol formation with three wide receivers (the slot receiver lined up outside the right tackle) and one tight end.  The Ravens called a run-pass option with Jackson handing the ball off to Ingram who ran between the right guard and the center for eight yards, gaining the first down.  Kansas City had deployed four defensive linemen, one linebacker, three cornerbacks and three safeties on the play, with the three safeties playing zone defense across the field at a depth of 15 yards.  The defensive linemen had lined up in wide gaps.

For the season to date prior to this game, Baltimore attempted seven other fourth down plays on offense.  It used the following personnel groupings with results listed:

Distance to GainPersonnel GroupResult
74 WR; 1 RBIncomplete pass to outside receiver- failure
53 WR; 1 TE; 1 RBRB run for 5 yards – success
42 WR; 2 TE; 1 RBPass to TE #1 for 8 yards – success
21 WR; 3 TE; 1 RBRB run for 7 yards – success
13 TE; 1 FB; 1 RBRB run for 2 yards -success
13 TE; 1 FB; 1 RBQB sneak for 3 yards – success
13 TE; 1 WR; 1 RBQB sneak for 0 yards – failure

League-wide (excluding the Ravens), 12 plays have been attempted so far on fourth down with at least four yards to gain, and only three plays were successful.  Over the prior season there were 18 plays attempted on fourth down with at least four yards to gain and five were successful.  Two of those attempts were by Baltimore and neither succeeded.

Focus on Relevancy

How can/should Baltimore use analytics derived from prior results to determine which play to run in the current situation?  The above table doesn’t list many other applicable variables: individual personnel on the field, field conditions, fatigue, defensive formation, etc.

Indeed, there are an enormous number of variables inputs to a mathematical analysis of which play Baltimore should select.  Which variables to choose?  Which are more important?  Complicating matters, the frequency of any single combination of variables in all prior data sets of plays is low.  This is true regardless of which sets of plays are selected for analysis. Many combinations of variables are not even relevant to the analysis.

What becomes clear is this.  Judgment plays a critical role in how analytics are deployed.  Different analysts will focus on different variables.  There cannot be a single rule of thumb regarding which variables are important, and to what degree.  Human judgment plays the critical role in setting the analytical rules for analysis.  Which plays are relevant?  How do we establish a system of relevancy?  Can we?

Relevancy, Sample Size and the Role of Human Judgment

Let’s look more deeply into Baltimore’s play selection decision.  As an initial consideration, should Baltimore look only at prior fourth down play results as a guide to enhance the likelihood that its fourth down play selection will succeed?  Which prior plays are relevant?  How can statistical analysis of prior results guide future decisions?

There are at least seven groups of prior play results to consider:

  1.  The Ravens fourth down attempt earlier in this game.
  2.  The Ravens other seven fourth down attempts earlier in the season.
  3.  The 12 other fourth down attempts league-wide this season.
  4.  The 18 league-wide attempts last year.
  5.  Baltimore’s two attempts last year.
  6.  Baltimore’s short-yardage third down attempts earlier in this game.
  7.  Baltimore’s short-yardage third down attempts earlier this season.

Which groups of attempts are relevant to the Ravens’ decision?

Mathematically, relevance is determined by weighting.  For example, the first category might be given a weight of 100%, the second group 80%, the fourth group 10%, etc.  Human judgment is the fulcrum behind all weighting decisions.  This means that the mathematical analysis is dependent on human judgment.  Obviously, whenever judgments are involved humans will likely disagree.  And there is no way to assure any consistency across human relevancy judgments.  All of this means that football analytics cannot be entirely rules-based.

Human Judgment

So lets provide a little human judgmental input to the example.  From a weighting perspective we might argue that the result of the fourth down play run by the Ravens earlier in this game should be weighted at or near 100%.  After all, we could claim, why is the result of a play from twelve weeks ago against a different team relevant to the current decision?  On the other hand, maybe we’d like to know those other results but adjust for the quality of the players on the field.  But isn’t it really fool’s gold to believe that we can do so with a mathematical precision that will improve our chances of calling the right play today?

Even if we elect just to focus on the earlier play in this game, can its result be incorporated into an analytical framework in real-time?  After all, the play clock is running and the coach may want to consider more than just the analytics.  For one, all of the prior Ravens fourth down plays utilized Mark Ingram at running back.  But at this critical moment in the Chiefs game Ingram isn’t at full strength.  Does/should that affect the Ravens play selection?  If so, can analytics of prior plays be a real guide?

There are other possibilities.  We might elect to consider any of the seven possible fourth down play groupings (above).  But isn’t the sample size in each fourth-down grouping too small?  Is the sample size in any grouping statistically significant?  And isn’t that also the case even including the listed groupings as a whole?  This doesn’t even take into account any of the other variables attendant to the prior fourth-down plays.

So we have relevancy issues.  And we have sample size concerns.  And the more we try to solve our sample size issues, the larger our relevancy concerns become.  The role of human judgments grows.

Play Selection:  Many Disparate Factors

Because there are so many individual variables involved in each play the output of those plays does not inform the current decision.

Now consider a second game and the dissimilarity of those variables from the Chiefs-Ravens game.  The Ravens trail Pittsburgh at home 23-17 with three minutes left in the game.  The ball is at the Steelers’ 35-yard line and it’s fourth down and four to go.  It’s a sunny day and winds are calm.  The Ravens have all three timeouts left.  Starter Lamar Jackson left the game early in the second quarter.  His backup, Robert Griffin III, engineered one scoring drive of 52-yards.  Griffin has thrown for 225 yards on 20 completions out of 28 throws, but one pass completion of 78 yards occurred when a Steelers’ cornerback fell down.  The Ravens ran for a total of 53 yards on 17 attempts after Jackson left the game.

The Ravens elect to go for it on fourth down.  They come out in a four wide receiver set with backup Justice Hill at running back.  They are in a shotgun formation.  The Steelers have four defensive linemen on the field, one middle linebacker, and six defensive backs.  The Steelers best pass rusher, T.J. Watt, is injured and unable to play.

The Ravens bring wide receiver Marquise Brown in motion in front of Griffin towards the wide-end of the field.   Football AnalyticsGriffin flips him the ball.  Brown catches it.  The off-side cornerback, Joe Haden, is in perfect position to tackle Brown.  But Brown makes an incredible fake, Haden slips, and Brown gains 12 yards for a first down.

Is this play and result relevant to the Ravens play selection against the Chiefs at the same down and distance?  To what extent, if at all?  Down and distance are the same.  But not much else.

The skill sets of the defensive players on the field are different than the Chiefs.  The Steelers best pass rusher and edge defender did not play.  This increased the probability of successful blocks by the Ravens offensive tackles.  Lamar Jackson and Mark Ingram, the Ravens’ best two runners, could not play.  The skill sets of their replacements are different.  Jackson is both a superior passer and runner than Griffin.  Hill is faster than Ingram but not nearly as good as an interior runner, blocker, or blitz protector.  And so on.

More on Variables and Relevancy

Variables are highly differentiated.  Across situations any single variable, or multiple variables, might not even attach.  Relevancy of each individual variable, and combination of variables, is suspect and subject to human judgment.  The frequency of prior variable combinations is oftentimes low.

What would the analytics framework look like going into an individual play decision?  Would it show increased probabilities for different play selections and different defenses across a grid?  Would that output be too voluminous such that it would be reduced to a simple choice among two possible plays?  Given real-time game limitations, how do the Ravens decide within 10 seconds or so which play to select?

The number of variables attendant for any given down and distance are very large.  Remember, these aren’t simple mathematical variables.  In many cases they are human variables.  Different individual talents and skills of 22 players in isolation and in combination with others.

Humans are biological beings where individual performance under a multitude of conditions is, itself, highly variable.  Variables interplay is so difficult to measure that past results from prior games between different teams are meaningless and non-informative. Isn’t a lot of it just noise?

In baseball, analytics has the advantage of a much narrower variables universe.  There are far fewer players and most interplay involve two-player interaction.  Not so in football.

Plays from prior Ravens games at least contain variables that the Ravens coaches can minimize or control.  They can do this by utilizing, for example, the same players, same formations, etc., as were used in prior games.  The Ravens can attempt to reduce the number of variables involved in a play as contrasted to prior plays.  To that extent, the results (outputs) of those plays may have more relevancy to the Ravens’ decision-making.  But how much more?

How precise can football analytics be in assigning probabilities for future outcomes, particularly since relevancy is human determined?  In our example, many of the applicable variables regarding the opposing defense differ from the prior Ravens’ fourth down attempts.

What to Make of Football Analytics?

All of this suggests that analytics are of doubtful utility.  And they are difficult to apply in game real-time.  Are analytics really useful at all in non-binary choice decision making?  The only relevant prior events are outputs involving the same Ravens personnel as applied in similar situations during recent games played by the Ravens.  And more so, the most recent, and therefore arguably the event most relevant to Baltimore’s decision in the Chiefs game, is the single fourth down play attempt in that game.  But even so, the data set (one prior play), is too small as to be statistically meaningful.

Bill Belichick is right regarding football analytics:

“I just try to evaluate what I see. . . . [I]t’s just trying to evaluate where players are physically, mentally, emotionally in terms of playing football in their career and that’s really what I can go on.  Certainly there’s some other components but, in the end, those are the main things. . . . It’s an individual analysis based on the things that are pertinent to that game and that situation.  I don’t really care what happened in 1973 and what those teams did or didn’t do, I don’t really think that matters in this game.”  To summarize he puts “less than zero” into the role of analytics.

Do football analytics have much to add on most game decisions?  We think not.

Other Roles for Analytics in Football?

Are there better uses for football analytics than what we have considered so far?  Can analytics focused on team tendencies in different situations be helpful?  This is essentially our fourth down play analysis, applied more broadly.  Again, the number of variables is too numerous, and the relevancy of different variables (their relative weighting) is in any case human determined.

Individual Player Analysis and Tendencies

Can mathematical analysis of individual player performances measurably improve the probability of success for a player in a specific play?  For example, can an analysis of every hook route run by a wide receiver provide measurable insights to improve the likelihood that the next hook route run on a third down with eight yards to gain will result in a first down?

Consider hook routes run by Cooper Kupp of the Rams.  Say he ran 28 hook routes at an average depth of eight yards during the 2019 season.  Assume its third down and eight yards to go.

Improving Kupp’s Hooks?

What data would be relevant to enhancing the probability that a Kupp hook route will produce a first down?  Should we incorporate into the analysis:

  1. The type of defensive formation run on all 28 prior routes?  We must look at zone defenses?  Was the defense in man coverage?  Did they double team ?  Did they double another receiver?  Did the defense blitz?  If so, what type of blitz?  How much depth did linebackers get dropping into coverage?  And more.
  1. Defensive player personnel, including the identity of the cornerback assigned to cover Kupp in man coverage?
  1. The number of times Kupp ran the pattern against the cornerback he is now facing?
  1. The situation during each game in which Kupp ran the route, e.g., were the Rams ahead/behind and by how much, the amount of game time remaining, etc.?
  1. The side of the field on which Kupp ran the route?
  1. Field conditions?
  1. The amount of average space separation Kupp generated on each route, and the amount of maximum separation, and breaking that down with respect to the cornerback he is facing?
  1. And more.

The number of possible inputs into any mathematical analysis are substantial.  We’ve listed just a few.  Do 28 prior routes provide a large enough data pool?  And in any event the relative weighting of all the variables involves human judgment.  We wonder – wouldn’t it be just as valuable, or more so, to simply ask Kupp before the play: can you beat the coverage on a hook route?

What Do You Think?

That’s our argument.  Football analytics is way over-hyped.  Do you agree?

Leave a Reply

Your email address will not be published. Required fields are marked *