An Analysis Of How Well High School Football Ratings Predict Future Success
The purpose of this study is to analyze ratings for high school football recruits as well as their success during their college careers. I will also be studying how highly these players were drafted by teams in the National Football League. NFL teams start scouting players before they are eligible to be drafted, and the earlier they start watching them, the more useful intel they can gather. This study can help see if rankings for these players at the high school level give a good indication of their true potential in the future, allowing NFL teams to identify top prospects earlier.
I will be using one dataset in this presentation. It includes 439 players from various positions that were recruited between the years 2009-2018.
This presentation looks to answer the following questions:
Does a higher rating coming out of high school lead to more future success and higher draft status?
Do certain positions develop more talent than others?
Which schools tend to produce high draft picks?
I will answer these questions by finding trends in the data using visual exploratory analysis.
rank | name | recruit year | recruit rating | stars | recruited position | school | draft grade | draft year | overall | round | pick | drafted position | draft success | countable plays | average PPA | total PPA | pass PPA | rush PPA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | Andre Debose | 2009 | 0.9913 | 5 | Wide Receiver | Florida | 30 | 2015 | 221 | 7 | 4 | Wide Receiver | 0 | 7 | -0.166 | -1.163 | -3.399 | 2.236 |
19 | Aaron Murray | 2009 | 0.9900 | 5 | Quarterback | Georgia | 62 | 2014 | 163 | 5 | 23 | Quarterback | 0 | 352 | 0.336 | 118.289 | 107.423 | 10.866 |
31 | Tajh Boyd | 2009 | 0.9840 | 5 | Quarterback | Clemson | 39 | 2014 | 213 | 6 | 37 | Quarterback | 0 | 481 | 0.262 | 126.105 | 105.782 | 20.323 |
52 | Logan Thomas | 2009 | 0.9719 | 4 | Quarterback | Virginia Tech | 61 | 2014 | 120 | 4 | 20 | Quarterback | 0 | 483 | 0.231 | 111.505 | 114.490 | -2.985 |
87 | AJ McCarron | 2009 | 0.9594 | 4 | Quarterback | Alabama | 78 | 2014 | 164 | 5 | 24 | Quarterback | 0 | 320 | 0.295 | 94.326 | 88.052 | 6.274 |
148 | Arthur Lynch | 2009 | 0.9300 | 4 | Tight End | Georgia | 45 | 2014 | 155 | 5 | 15 | Tight End | 0 | 28 | 1.144 | 32.036 | 32.036 | NA |
227 | Carlos Hyde | 2009 | 0.9078 | 4 | Fullback | Ohio State | 88 | 2014 | 57 | 2 | 25 | Running Back | 1 | 207 | 0.306 | 63.250 | 5.726 | 57.524 |
240 | Kevin Norwood | 2009 | 0.9051 | 4 | Wide Receiver | Alabama | 66 | 2014 | 123 | 4 | 23 | Wide Receiver | 0 | 31 | 1.372 | 42.531 | 42.531 | NA |
244 | Jeremy Gallon | 2009 | 0.9043 | 4 | Athlete | Michigan | 30 | 2014 | 244 | 7 | 29 | Wide Receiver | 0 | 87 | 1.068 | 92.933 | 94.315 | -1.382 |
286 | Tyler Gaffney | 2009 | 0.8955 | 4 | Fullback | Stanford | 56 | 2014 | 204 | 6 | 28 | Running Back | 0 | 320 | 0.115 | 36.796 | 3.081 | 33.715 |
Below are descriptions of the variables used in this study:
rank: player’s national ranking in their class coming out of high school
name: player’s name
recruit year: year player started college football
recruit rating: composite recruit rating (0.0-1.0)
stars: recruit star rating (1-5)
recruited position: position player was recruited as
school: college player attended
draft grade: assessment of NFL potential (0-100)
draft year: year player was drafted
overall: overall pick number player was selected at in draft
round: which round player was selected at in draft (1-7)
pick: pick number in round player was selected at in draft
drafted position: position player was drafted as
draft success: was player drafted in rounds 1, 2, or 3? (0 = no, 1 = yes)
countable plays: how many plays player contributed to that was tracked
average PPA: Predicted Points Added, how many points per play player added to their team
total PPA: how many points in their career player added to their team
pass PPA: how many points player added on passing plays in their career
rush PPA: how many points player added on rushing plays in their career
From my analysis, I found that better high school ratings do positively correlate with players becoming early draft picks. I also found out that position value matters, as players at more valuable positions are drafted more often in early rounds than less valuable positions, even if they are not necessarily more talented. Finally, there is a strong positive correlation between the amount of top recruits a school can bring into their program and the amount of early draft picks they produce.
I started by creating a bar chart breaking down how many players were in each category of stars received. 3 and 4 stars were the most prevalent, with 189 and 167, respectively. This allows us to see how the majority of players are rated coming out of high school.
Then, I made a separate bar chart for each star rating, analyzing where the players in those categories were selected in the NFL draft. Most of the 2 stars were taken in the back half of the draft, with a mode in the 6th round. 3 stars were more evenly distributed, with less of a skew and two modes in the 4th and 6th rounds. The 4 stars were also very evenly distributed, with their highest round being the 3rd. Finally, the overwhelming majority of 5 stars were selected very early in the draft, having a mode in the 2nd round.
Comparing all of the star ratings and the rounds their players were selected in, it is clear that the players with higher ratings coming out of high school, while not always better, do tend to be drafted higher than those with lower ratings.
I began by generating a bar chart that allowed me to see what the star rating percentage was for each position. Then, I made another bar chart which showed the percentage of players taken early in the draft at each position. This allows us to see if there’s any correlation between the amount of highly rated players there are at a position and how early they may be drafted, indicating if some positions may have an easier transition from high school to college than others.
When looking at draft success, quarterbacks have the best percentage of high draft picks at over 55%. That could have been predicted based on their star ratings, because outside of cornerbacks(which has the lowest sample size), quarterbacks have the highest percentage of 5 stars, indicating their players were ranked very highly coming out of high school.
However, running backs, which had the second highest 5 star percentage outside of cornerbacks, had the least draft success of just under 38% being taken in the early rounds. This seems to defy the trend of high ratings being drafted early. It shows that while there may be quality players at that position, NFL teams care more about what position a player is at than just how talented they are, as a quarterback is much more valuable to a team than a running back.
Colleges with Most Top Recruits | Colleges to Produce Most Top Picks |
---|---|
Alabama | Alabama |
Clemson | Ohio State |
Ohio State | Penn State |
Notre Dame | Clemson |
Florida | Notre Dame |
Georgia | LSU |
Stanford | USC |
Auburn | Florida |
Florida State | Ole Miss |
Michigan | Stanford |
In this section, I focused on which colleges were able to add the highest amount of top recruits, and also which colleges produced the most early round draft picks. To start this, I classified a top recruit as a 4 or 5 star in addition to an early round pick being a 1st, 2nd, or 3rd rounder.
After filtering the data to only include 4 or 5 star recruits, I found that Alabama had gained the most amount of top recruits with 18, or about 4% of all of the 4 and 5 stars, with Clemson and Ohio State following. Then, after filtering the data to only include 1st, 2nd, and 3rd rounders, Alabama was again in the lead with 17 selections, again representing almost 4% of all early picks, with Ohio State and Penn State following.
This shows that Alabama is not only the best in recent years at recruiting the top high school players in the nation, but also the best at developing their talent and producing top NFL prospects. Other schools, such as Ohio State, Clemson, and Notre Dame, also do a good job of this, but Alabama is cleatly the frontrunner in this area.
From my analysis, I was able to determine that high school rankings, while not perfect, do give a good indication of which players may become high draft picks. The higher a player’s rating, the more likely they are to end up being an early draft pick. This can allow NFL teams to start scouting these players multiple years before they are eligible to be drafted.
I also found that while the top rated players can be some of the most talented players, it does not always mean they will be drafted highly. NFL teams care about positions, so a more valuable position like quarterback will be drafted more often in the early rounds than a less valuable position like running back, even if they are not as talented.
Finally, I was able to see which schools are the best at getting the top recruits to play for them and which schools are the best at producing top draft picks. Alabama was clearly at the top of both categories, and there is a correlation between the amount of highly rated recruits a school gets and how many early draft picks they have. This can show NFL teams which schools to scout more often, as they have the best talent, and it also shows recruits which schools they should attend, as it reveals which programs are the best at developing their talent.
One of the limitations of this dataset was that it mainly focused on offensive skill players. It did not place any emphasis on linemen or defensive players, so we are unsure if the results of how high school ratings trasnlate to on success also apply to those positions.
Another limitation was that some players had a lot less countable plays than others, which would hurt their PPA. That piece of analysis is not always an accurate read of a player’s impact, as there are ways to be involved with having the ball in their hands. That is why some players with lower PPA’s may have been drafted much earlier than those with higher PPA’s, making those variables and countable plays a hard stat to use effectively in this study.
---
title: "Football Recruiting Analysis"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: materia
primary: "#F54242"
secondary: "#2196f3"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{=html}
<style>
.chart-title { /* chart_title */
font-size: 20px;
}
body{ /* Normal */
font-size: 20px;
}
</style>
<head>
<base target="_blank">
</head>
```
```{css color tabs}
.nav-tabs-custom .nav-tabs > li > a
{
color: #black;
}
.nav-tabs-custom .nav-tabs > li.active > a
{
color: #2196f3;
}
/* To set color on hover */
.nav-tabs-custom .nav-tabs > li.active > a:hover
{
color: grey;
}
<style type="text/css"> .sidebar
{
overflow: auto;
}
</style>
```
```{r setup, include=FALSE}
library(flexdashboard)
```
```{r package_data}
library(tidyverse)
library(knitr)
library(dplyr)
library(readr)
library(ggplot2)
library(DT)
library(plotly)
library(viridis)
high_school_rankings <- read_csv("C:/Users/John Hannan/Downloads/archive (3).zip")
high_school_rankings <- subset(high_school_rankings, select = -c(ident, college, id, team, season, career_avgPPA, career_totalPPA, position...24))
high_school_rankings <- high_school_rankings %>% rename("recruit year" = "recruit_year",
"recruit rating" = "comp_recruit_rating",
"recruited position" = "position...7",
"draft grade" = "draft_grade",
"draft year" = "draft_year",
"round" = "d_round",
"drafted position" = "position_y",
"draft success" = "draft_success",
"countable plays" = "countablePlays",
"average PPA" = "averagePPA.all",
"total PPA" = "totalPPA.all",
"pass PPA" = "totalPPA.pass",
"rush PPA" = "totalPPA.rush")
high_school_rankings$round <- as.factor(high_school_rankings$round)
high_school_rankings$stars <- as.factor(high_school_rankings$stars)
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "APB"] <- "Running Back"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "ATH"] <- "Athlete"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "CB"] <- "Cornerback"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "DUAL"] <- "Quarterback"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "FB"] <- "Fullback"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "ILB"] <- "Inside Linebacker"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "OLB"] <- "Outside Linebacker"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "PRO"] <- "Quarterback"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "RB"] <- "Running Back"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "S"] <- "Safety"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "SDE"] <- "Strong Defensive End"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "TE"] <- "Tight End"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "WDE"] <- "Weak Defensive End"
high_school_rankings$`recruited position`[high_school_rankings$`recruited position` == "WR"] <- "Wide Receiver"
```
# Introduction
## Column {.tabset data-width="650"}
### Basic Info
<font size = 5> **An Analysis Of How Well High School Football Ratings Predict Future Success** </font>
The purpose of this study is to analyze ratings for high school football recruits as well as their success during their college careers. I will also be studying how highly these players were drafted by teams in the National Football League. NFL teams start scouting players before they are eligible to be drafted, and the earlier they start watching them, the more useful intel they can gather. This study can help see if rankings for these players at the high school level give a good indication of their true potential in the future, allowing NFL teams to identify top prospects earlier.
I will be using one dataset in this presentation. It includes 439 players from various positions that were recruited between the years 2009-2018.
This presentation looks to answer the following questions:
- Does a higher rating coming out of high school lead to more future success and higher draft status?
- Do certain positions develop more talent than others?
- Which schools tend to produce high draft picks?
I will answer these questions by finding trends in the data using visual exploratory analysis.
### Glimpse of Player Analysis
```{r glimpse}
kable(high_school_rankings[1:10,])
```
## Column {data-height="650"}
### Explanation of Variables
Below are descriptions of the variables used in this study:
rank: player's national ranking in their class coming out of high school
name: player's name
recruit year: year player started college football
recruit rating: composite recruit rating (0.0-1.0)
stars: recruit star rating (1-5)
recruited position: position player was recruited as
school: college player attended
draft grade: assessment of NFL potential (0-100)
draft year: year player was drafted
overall: overall pick number player was selected at in draft
round: which round player was selected at in draft (1-7)
pick: pick number in round player was selected at in draft
drafted position: position player was drafted as
draft success: was player drafted in rounds 1, 2, or 3? (0 = no, 1 = yes)
countable plays: how many plays player contributed to that was tracked
average PPA: Predicted Points Added, how many points per play player added to their team
total PPA: how many points in their career player added to their team
pass PPA: how many points player added on passing plays in their career
rush PPA: how many points player added on rushing plays in their career
### Abstract
From my analysis, I found that better high school ratings do positively correlate with players becoming early draft picks. I also found out that position value matters, as players at more valuable positions are drafted more often in early rounds than less valuable positions, even if they are not necessarily more talented. Finally, there is a strong positive correlation between the amount of top recruits a school can bring into their program and the amount of early draft picks they produce.
# Player Overview
## Column {.tabset}
### High School Recruits Table
```{r recruits table}
DT::datatable(high_school_rankings[,1:6], rownames = FALSE,
options = list(columnDefs = list(list(className = 'dt-center', targets = c(0, 2:5)))))
```
### NFL Draft Prospects Table
```{r prospects table}
DT::datatable(high_school_rankings[,c(2, 7:14)], rownames = FALSE,
options = list(columnDefs = list(list(className = 'dt-center', targets = c(1, 8)))))
```
### Player Stats Table
```{r stats table}
DT::datatable(high_school_rankings[,c(2, 15:19)], rownames = FALSE,
options = list(columnDefs = list(list(className = 'dt-center', targets = c(1:5)))))
```
Stars
===
Column {.tabset data-width=850 .no-padding}
-----
### Stars Breakdown
```{r stars}
font <- list(
family = "Arial",
size = 14,
color = "white"
)
label <- list(
bgcolor = "#232F34",
bordercolor = "transparent",
font = font
)
p1 <- ggplot(high_school_rankings, aes(x = stars)) +
geom_bar(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "Number of Players per Star Ratings", x = "Star Rating", y = "Number of Players")
ggplotly(p1) %>%
style(hoverlabel = label) %>%
layout(font = font)
```
### 2 Stars
```{r 2 stars}
stars2 <- high_school_rankings %>%
filter(stars == 2)
p2 <- ggplot(stars2, aes(x = round)) +
geom_bar(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "2 Stars Taken per Round", x = "Round", y = "Number of Players")
ggplotly(p2)
```
### 3 Stars
```{r 3 stars}
stars3 <- high_school_rankings %>%
filter(stars == 3)
p3 <- ggplot(stars3, aes(x = round)) +
geom_bar(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "3 Stars Taken per Round", x = "Round", y = "Number of Players")
ggplotly(p3)
```
### 4 Stars
```{r 4 stars}
stars4 <- high_school_rankings %>%
filter(stars == 4)
p4 <- ggplot(stars4, aes(x = round)) +
geom_bar(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "4 Stars Taken per Round", x = "Round", y = "Number of Players")
ggplotly(p4)
```
### 5 Stars
```{r 5 stars}
stars5 <- high_school_rankings %>%
filter(stars == 5)
p5 <- ggplot(stars5, aes(x = round)) +
geom_bar(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "5 Stars Taken per Round", x = "Round", y = "Number of Players")
ggplotly(p5)
```
### Comparison
```{r conditional}
p_c <- ggplot(high_school_rankings, aes(x = stars, fill = round)) +
geom_bar(width=0.75, position = "fill") +
scale_y_continuous(breaks = seq(0, 1, by = .2),
labels = scales::percent) +
labs(y = "Round Percentage") +
theme(axis.text.x = element_text(angle=30, hjust=1),
text = element_text(size = 20))
ggplotly(p_c) %>%
style(hoverlabel = label) %>%
layout(font = font)
```
Column
-----------------------------------------------------------------------
### Analysis
I started by creating a bar chart breaking down how many players were in each category of stars received. 3 and 4 stars were the most prevalent, with 189 and 167, respectively. This allows us to see how the majority of players are rated coming out of high school.
Then, I made a separate bar chart for each star rating, analyzing where the players in those categories were selected in the NFL draft. Most of the 2 stars were taken in the back half of the draft, with a mode in the 6th round. 3 stars were more evenly distributed, with less of a skew and two modes in the 4th and 6th rounds. The 4 stars were also very evenly distributed, with their highest round being the 3rd. Finally, the overwhelming majority of 5 stars were selected very early in the draft, having a mode in the 2nd round.
Comparing all of the star ratings and the rounds their players were selected in, it is clear that the players with higher ratings coming out of high school, while not always better, do tend to be drafted higher than those with lower ratings.
Positions
===
Column {.tabset data-width=650}
-----
### Stars Breakdown Per Drafted Position
```{r DP}
p_dp <- ggplot(high_school_rankings, aes(x = `drafted position`, fill = stars)) +
geom_bar(width=0.75, position = "fill") +
scale_y_continuous(breaks = seq(0, 1, by = .2),
labels = scales::percent) +
labs(y = "Star Rating Percent") +
theme(axis.text.x = element_text(angle=30, hjust=1),
text = element_text(size = 20))
ggplotly(p_dp) %>%
style(hoverlabel = label) %>%
layout(font = font)
```
### Draft Success Per Position
```{r DS}
high_school_rankings$`draft success` <- as.factor(high_school_rankings$`draft success`)
p_ds <- ggplot(high_school_rankings, aes(x = `drafted position`, fill = `draft success`)) +
geom_bar(width=0.75, position = "fill") +
scale_y_continuous(breaks = seq(0, 1, by = .2),
labels = scales::percent) +
labs(y = "Draft Success Percent") +
theme(axis.text.x = element_text(angle=30, hjust=1),
text = element_text(size = 20))
ggplotly(p_ds) %>%
style(hoverlabel = label) %>%
layout(font = font)
```
Column
-----------------------------------------------------------------------
### Analysis
I began by generating a bar chart that allowed me to see what the star rating percentage was for each position. Then, I made another bar chart which showed the percentage of players taken early in the draft at each position. This allows us to see if there's any correlation between the amount of highly rated players there are at a position and how early they may be drafted, indicating if some positions may have an easier transition from high school to college than others.
When looking at draft success, quarterbacks have the best percentage of high draft picks at over 55%. That could have been predicted based on their star ratings, because outside of cornerbacks(which has the lowest sample size), quarterbacks have the highest percentage of 5 stars, indicating their players were ranked very highly coming out of high school.
However, running backs, which had the second highest 5 star percentage outside of cornerbacks, had the least draft success of just under 38% being taken in the early rounds. This seems to defy the trend of high ratings being drafted early. It shows that while there may be quality players at that position, NFL teams care more about what position a player is at than just how talented they are, as a quarterback is much more valuable to a team than a running back.
Colleges
===
Column {.tabset data-width=650}
-----
### Colleges with Top Recruits
```{r college}
top_colleges <- high_school_rankings %>%
filter(stars %in% c(4, 5)) %>%
group_by(school) %>%
summarise(count = n(), percent = round(n()/nrow(high_school_rankings)*100, 2)) %>%
arrange(desc(count))
datatable(top_colleges)
```
### Colleges with Top Draft Picks
```{r nfl}
top_picks <- high_school_rankings %>%
filter(`draft success`==1) %>%
group_by(school) %>%
summarise(count = n(), percent = round(n()/nrow(high_school_rankings)*100, 2)) %>%
arrange(desc(count))
datatable(top_picks)
```
### Comparison
```{r}
kable(cbind.data.frame(top_colleges$school[1:10], top_picks$school[1:10]),
col.names = c("Colleges with Most Top Recruits", "Colleges to Produce Most Top Picks"))
```
Column
-----------------------------------------------------------------------
### Analysis
In this section, I focused on which colleges were able to add the highest amount of top recruits, and also which colleges produced the most early round draft picks. To start this, I classified a top recruit as a 4 or 5 star in addition to an early round pick being a 1st, 2nd, or 3rd rounder.
After filtering the data to only include 4 or 5 star recruits, I found that Alabama had gained the most amount of top recruits with 18, or about 4% of all of the 4 and 5 stars, with Clemson and Ohio State following. Then, after filtering the data to only include 1st, 2nd, and 3rd rounders, Alabama was again in the lead with 17 selections, again representing almost 4% of all early picks, with Ohio State and Penn State following.
This shows that Alabama is not only the best in recent years at recruiting the top high school players in the nation, but also the best at developing their talent and producing top NFL prospects. Other schools, such as Ohio State, Clemson, and Notre Dame, also do a good job of this, but Alabama is cleatly the frontrunner in this area.
Conclusion
===
Column {data-length=650}
---
### Results
From my analysis, I was able to determine that high school rankings, while not perfect, do give a good indication of which players may become high draft picks. The higher a player's rating, the more likely they are to end up being an early draft pick. This can allow NFL teams to start scouting these players multiple years before they are eligible to be drafted.
I also found that while the top rated players can be some of the most talented players, it does not always mean they will be drafted highly. NFL teams care about positions, so a more valuable position like quarterback will be drafted more often in the early rounds than a less valuable position like running back, even if they are not as talented.
Finally, I was able to see which schools are the best at getting the top recruits to play for them and which schools are the best at producing top draft picks. Alabama was clearly at the top of both categories, and there is a correlation between the amount of highly rated recruits a school gets and how many early draft picks they have. This can show NFL teams which schools to scout more often, as they have the best talent, and it also shows recruits which schools they should attend, as it reveals which programs are the best at developing their talent.
### Limitations
One of the limitations of this dataset was that it mainly focused on offensive skill players. It did not place any emphasis on linemen or defensive players, so we are unsure if the results of how high school ratings trasnlate to on success also apply to those positions.
Another limitation was that some players had a lot less countable plays than others, which would hurt their PPA. That piece of analysis is not always an accurate read of a player's impact, as there are ways to be involved with having the ball in their hands. That is why some players with lower PPA's may have been drafted much earlier than those with higher PPA's, making those variables and countable plays a hard stat to use effectively in this study.
### References
https://www.kaggle.com/datasets/jwblackston/recruit-draft-eval
About the Author
===
Column {data-width = 650}
---
### About Me
My name is John Hannan, and I am an undergraduate student at the University of Dayton, currently in my junior year and on track to graduate in May 2025.
I am pursuing a major in Sport Management as well as minors in both Business Administration and Data Analytics.
After graduation, I am interested in working in the football industry as a scout for an NFL franchise, finding players who excel at the college level and will help teams win at the professional level.
I am eager to continue to find new trends to try to make scouting players more efficient and accurate.
Please connect with me on [LinkedIn](https://www.linkedin.com/in/john-hannan-a1897b222/).
Column {.tabset data-width = 600}
---
### Picture
<img src="John Hannan.jpg" width="50%" height="auto" style="display: block; margin: auto;">