4th and Onwards!

A full analysis of 4th downs in today's NFL -- By Alex Ackler

It's a cold January evening somewhere in America. You're watching your favorite NFL team in a playoff game, tied at 20 with the clock slowly ticking. The other team has the ball, but their drive has stalled at midfield. "Iiiiit's 4th dooowwwwwn!", cries a familiar voice from above the stadium. They're going for it. Even from the TV, you know that the roar of the crowd is deafening. Their quarterback tries to tune out the noise, the pressure of the moment. Could it all come down to this one play...?

I've been a fan for as long as I can remember, and 4th down attempts might just be my favorite moments in all of sports. The risk is massive, but the rewards are so enticing that many teams just can't resist. Many legends have been born, many memories made, hearts broken and coaches fired, on the infamous 4th down conversion attempt.

Of course, this means coaches are usually okay with just punting it.

In this statistical analysis, I want to explore the decision-making of 2021 NFL teams on 4th down, and see what circumstances warrant a team's decision on that particular down: When it's best to go for it, kick a field goal, or just punt it away.

BACKGROUND INFO: Rules and Terms

For a full explanation of the rules of NFL football, here is an article that goes into detail. For this analysis, however, we only need to understand certain aspects of the game, so I'm going to explain those briefly here.

For our purposes, we only care about drives in football. A drive is when one team is on offense, trying to push the ball as far as they can down the field. The other team is on defense, trying to stop them. The offense can make it all the way to the other team's end zone for a touchdown (worth anywhere from 6-8 points), or come up short but still go far enough to kick a field goal (worth 3 points).

The offense has 4 chances, or downs, to make this happen. Obviously, the 100 yard football field is a long way to go with just 4 plays- so, to help the offense out, if they move the ball just 10 yards from where they started the drive, they get a first down. Now, they have 4 more plays to make it the next 10 yards, for the next first down. This goes on until they score, or they commit a turnover- giving the ball up to the other team.

So now, it's hopefully clear why 4th down is so important. For the offense, it's their last chance to make something happen on the drive. For the defense, it's their chance to make a big stop and get some rest off the field.

BACKGROUND INFO: Choices on 4th down

The defense's goal is simple: stop the offense. On offense, all teams generally make one of 3 decisions on 4th down:

  1. FIELD GOAL: Try and kick the field goal for 3 points. If you make the field goal, you kickoff the ball to the other team (which usually starts their drive about 25 yards down the field.) If you miss it, they get the ball right where you left it.
  2. GO FOR IT: Try and get to the first down marker or the end zone. If you make it, you get a first down (or a touchdown!) But if you don't, the other team gets the ball right where you left it, known as a turnover on downs.
  3. PUNT: Have a player essentially drop kick the ball as far as they can down the field. Wherever the other team gets it is where they'll start their drive. NFL Punters can usually punt the ball about 50-60 yards.

So as we can see, scoring points aren't the only thing that matters on 4th down. Positioning is also crucial.

For example, if you go for it at your own 1 yard line and fail, you've given your opponent the ball at the 1- an easy chance for them to score a touchdown. Even if you make the first down, you'll still have a long way to go before you get any points. So in this case, punting is your best option. You won't score points, but at least you'll push your opponent farther back, and give your defense a better chance to make a stop. 0 points is better than -6, after all.

This was an example of an obvious choice, but things can get very hazy in actual games. What if you're not deep in your own territory, but instead at midfield? What if it's 4th down at your opponent's 1 yard line- do you really settle for the field goal, when you're so close to the touchdown? How does the score, time on the clock, and general situation of the game impact your decision?

This is exactly the kind of question I hope to answer with my analysis.

Data Scraping

Let's start with getting our data. We will be using https://www.pro-football-reference.com/, a lovely website that has stats on everything imaginable in the NFL.

We also use numpy and pandas, two pretty standard libraries for data analysis in Python, requests and bs4 for obtaining/parsing the data from pro football reference.com, and a few other libraries that could be handy.

Pro Football Reference is a well-structured website that has a seperate URL for every game ever played and recorded. So in short, we will need to get to each URL of each game in the 2021 season, get the play-by-play table of those games, and find fourth down statistics we're interested in.

Here is an example of how we can process the play-by-play table of a game. This was a week 13 game between the Philadelphia Eagles and New York Jets.

Tips for how to scrape this website, which wrote its HTML slightly differently, were found here: https://stackoverflow.com/questions/55198103/scraping-difficult-table

It also should be noted that I am only using Pro Football Reference's database for personal use in an education-related project, which does not violate their terms of use.

It's here we notice a big problem. The table doesn't tell us what team is on offense! We could keep track of turnovers and kickoffs, but that would be long and painful. Instead, we can use another table present at the URL for each game: the drives table. (Recall a drive is whenever a team's offense possesses the ball.) This table tells us when each team started each of their drives. There's a drives table for both teams at every URL.

It will be particularly useful to us if we can combine the two tables into one, so we can see exactly when each team had the ball.

With this, we have all the data we need. The next step is to put it in a more usable form.

Data Tidying

If we can sort the above drive table based on "Quarter" and "Time", we could have the full time table we desire. But to make things a little easier, we'll create a new field, "Total Time" that tells us how much time in total is left on the game clock. 15:00 in the 1st quarter is actually 60 minutes total, for example.

Let's use this function and do some sorting to get the drives in a nice, timely order. In the end, we have to sort on both Quarter and Total Time anyway, since overtime is a challenging special case to handle. Total Time is still useful, however, since it is a "datetime" object we can easily sort.

I would use the above method again for the 4th downs table, but we don't need to sort what's already sorted, so the code is shorter here.

Now that we have Total Time on both tables, we're ready to combine them. Note that "Total Time" represents the time the drive started in the drives table, and the time the play happened in the 4th downs table.

Process: For each 4th down, look at the drives table. Find drives whose "total time" field (representing when the drive started) comes BEFORE this play. We know that the last drive starting before this play must be the drive the play happened in. Carry over the relevant fields from the drive table to make one big table!

Just two more improvements to make now. First, the "location" field can be improved by simplifying it with how many yards are left to go to the endzone. "NYJ 20", for example, means 80 yards to go for the Jets and only 20 for the Eagles.

Finally, We can simplify the score columns to one column to show how much the team that has the ball is up or down by. -3, for example, indicates a team is down by 3.

Great! Now we have to do that...for every game of the 2021 NFL season.

Every NFL game is stored under its own URL owned by pro football reference, involving an identifier with the date the game was played and the code of the home team. We are going to need to know how each team is encoded in the Pro Football Reference website. It was a bit painful to type this out, but I hope it will be worth it.

Now let's find those URL's. The way each URL is structured is that it always has the same header and footer, and then the aforementioned middle identifier. We need to know the date of every NFL game in 2021.

To do this, we access another webpage on Pro Football Reference, this one a full schedule of the 2021 NFL season. The code below gets us this table in the format of a Pandas dataframe. It's a little untidy, but we only care about the date of each game and who the home team was.

At last we can finally get the URL's. The format of each URL is:

header + DATE + HOME TEAM + footer

We handle the header and footer below in the next cell, first we have to extract the date (in format yyyymmdd0) and the home team's code.

The final step in data extraction is to visit every URL we've identified, get all the 4th downs played in each game (and the data associated with them), and aggregate them into a gigantic table.

NOTE: This step could take several minutes to run! If it is taking too long, please reduce the sample size at the "CHANGE ME" point in the cell above.

We've finally done it! We can get the data on every 4th down ever played in 2021!

...If we have a supercomputer. It's here I must confess that the computations in this project have been quite heavy, and the data scraping section has been long enough, so instead of processing all 200+ games I will only go through 80 (the first 5 weeks of the season). If you think your computer can handle it, change the array slice indicated at "CHANGE ME" a few cells up.

Exploratory Data Analysis

Let's start with some simple summary statistics. Below, we graph how many 4th downs were played in each quarter.

Frequency bar graphs help us tremendously here. I'll be linking the stack overflow threads I found these on since I personally find formal tutorials to be useless for this kind of thing. This thread is on frequency bar graphs: https://stackoverflow.com/questions/26476668/frequency-plot-in-python-pandas-dataframe/26477354

So, plays occur about the same amount of times in the four quarters, with a slight uptick in the 4th and 2nd quarter.

This makes sense when we understand the rules of football. The game goes into halftime after the second quarter, and finishes completely after the fourth. In both cases, teams will try to score any last minute points they can and prevent the other from doing the same by "burning off clock." In other words, they will call simple, short plays that take up time, and willingly go to 4th down to kick a field goal or score a touchdown.

Lastly, we can also see just how rare overtime is (represented as the 5th quarter.)

Let's try another frequency bar graph, but for the number of plays in the drive. How many plays do offenses typically run before reaching 4th down?

We'll use seaborn to keep the data in sorted order, and also for its colors. Seaborn is a fancier alternative to matplotlib, which we used above.

We can see that 4th downs are, by far, most common on the 4th play of the drive (when just 3 total plays have been run.) This means the offense gets the ball, fails to get 10 yards after 3 plays, and usually punts the ball right back to the other team.

This sad sequence of events is so common in football that it has a nickname: the "three and out".

As for the rest of the graph, we see that 4th downs occur with less frequency as time goes on. This is because drives typically don't last long in the NFL.

For our last component of EDA, let's find out which teams have come to the most 4th downs, and see what we can make of it! All teams played exactly 5 games in the data table we have, so this will be a fair comparison.

The Denver Broncos (DEN) played the most 4th downs, and the Kansas City Chiefs (KAN) played by far the fewest. One possible explanation for this is that, in terms of offense, the two teams are on opposite ends of the spectrum. NFL fans know that the Chiefs are one of the highest-scoring offenses in the league, rarely being held to 4th down, while the Broncos are ultra-conservative and like winning games with their defense.

Advanced Analysis

Our ultimate goal is to get a predictor for what a coach will do on a given 4th down, and for that we need to know what decisions were made on all the 4th downs we have so far.

We have a "result" column, but it tells us the result of the drive, not the play. For the play, we have to look through the "details" column, and this is where a rudimentary form of NL Processing comes in.

Here we simply look through the details column for key words that would tell us what the play is: "punt" would only appear in a punt play, and likewise for a field goal. Conversion attempts can be listed as many different things.

Note that even if the play goes wrong (punt is blocked, field goal misses, etc.) We only care about what the coach intended to happen.

Penalties are the hardest to classify because they can work with or against the offense, and they can even be accepted or declined by either team. Typically, when a penalty is accepted, the play doesn't count and the ball is moved up or down the field some amount of yards. So for now, we will give all accepted penalties a special category, and mostly ignore them in our analysis.

A simple question that we can easily answer is how often each decision is made on 4th down:

Punting is by far the most common, but why? What makes coaches so likely to give up? For that, it's time to get to the crux of the project! We will analyze all the factors that influence a coach's 4th down decision making.

Advanced Analysis, Part 2

We start by considering your position on the field during the 4th down play. Credit to each of these websites for tools on graphing: https://stackoverflow.com/questions/59204445/how-to-do-i-groupby-count-and-then-plot-a-bar-chart-in-pandas https://stackoverflow.com/questions/332289/how-do-you-change-the-size-of-figures-drawn-with-matplotlib

By now you may have noticed some errors exist in Pro Football Reference, for example, the 'punt' from the 1 yard line. In this link, see the play-by-play at 4 minutes in the 4th to see that a mislabelled game clock was the error that caused this: https://www.pro-football-reference.com/boxscores/202109190sea.htm

These errors are too complicated to root out in the limited scope of this project, so, we will just have to live with them and remind ourselves that no one would ever punt 1 yard from a touchdown.

--

This chart is fascinating to people who love the strategy involved with football. Here are some big takeaways:

For the next section, let's consider the score. I would expect more punts for teams with the lead, field goals for close games, and conversion attempts for teams down big. Does this hold up?

It's here that we can see some more trends emerge:

The last metric we want to measure is time on the clock. This alone will not make or break a coach's decision, rather, we hypothesize it works together with the other two. Consider if you were down by 14 points. In the 1st quarter, you might still punt, since there's plenty of time to get your team together. Late in the 4th, however, you'd have no choice but to go for it.

Note that we will consider quarters here and not full game time for the sake of simplicity.

We'll use the same technique we've been using to visualize decision-making.

As expected, teams punt less and less as the game get closer to finish. Usually, it's because they don't have a choice.

Predictions

With all the data we've gathered, can we make a 4th down decision given the inputs of: (1) position on the field, (2) score, and (3) time on the clock? For this task we'll use Python's machine learning tools: specifically, a decision tree.

A decision tree looks at multiple rows of data- in our case, 4th downs- and tries to figure out what factors would make a certain decision. It will split arbitrarily along attributes to accomplish this.

We're going to look only at the three attributes listed above, so we'll have to narrow down our table to just those attributes. If we wanted to, we could classify based on all of them, but these 3 seem the most important to me.

Now for the decision tree. For more info on Python Decision Trees, I found this tutorial helpful: https://www.datacamp.com/community/tutorials/decision-tree-classification-python

Now let's use our decision tree to classify our test data.

Note that resetting the index doesn't do any harm, since the order of rows is still the same as before.

For the last part of this project we should evaluate our decision tree and see how it did. Below is code to convert the table into a more readable form, so we can compare the actual decisions made on these downs to what our decision tree suggested- and maybe even our own suggestions.

How did we do? The Decision Tree can change depending on the test data randomly chosen, but some things seem to be common:

Conclusion

In this project, we took a tour of the entire data science pipeline, and learned a bit about professional football too.

We started by scraping the website https://www.pro-football-reference.com/ for data on 2021 NFL games. It was by far the longest and most challenging part of the project, since we had to first understand what methods they were using to store their data. We recognized that each game is stored under its own URL in the ProFootballReference database, and we determined those URLs must be by scraping a separate page off of ProFootballReference that contained the 2021 NFL schedule.

From there, we tidied our data tables, which was also a difficult task. We wanted to know which team was on offense during each 4th down, so we had to match our table of 4th down plays to the table of drives for each game. From that we deduced what team was on offense, and also learned some other useful information, such as the length of the drive.

We combined both of these steps in scraping all the URLs (or at least, as many of them as my CPU could handle), and aggregating them into one gigantic database of 4th downs across the league this year.

Next, we performed exploratory data analysis. We wanted to answer some burning questions, such as: What teams face 4th down most and least often? When in drives do 4th downs usually happen? And, how often is each decision made in general?

From here we did our advanced analysis, and tried to observe some trends in 4th-down decision making. We created and tested hypotheses on how positioning, score, and game clock affected our decisions. We created stacked bar graphs to illustrate how frequently coaches punt, kick field goals, and go for it with each of these considerations individually taken into account.

Lastly, with the help of Python's sklearn we crafted a decision tree that could help us make the tough calls. We picked a decision tree because it was the best mechanism to predict a decision based on the attributes of position, score and time. Our decision tree performed fairly well, matching the coach's choice quite often.

I hope this tutorial/project hybrid has been insightful to you in learning about data and the NFL!

Go Birds!