The beginning of most players NHL careers is with the annual NHL draft. Here, teams take
turns selecting the players they feel will contribute the most to their team going forward. Since a
team after drafting a player has exclusive negotiating rights with that player for the next few
years following the draft, it’s crucial to a team’s success that they make their selections wisely.
Making optimal picks benefits a team’s long term chance of success by ensuring a consistent flow of
new talent into their lineup. This potential for adding value is what makes the NHL draft such an
interesting puzzle to try and solve.
Through the years there have been many approaches for making smarter picks at the NHL draft. There was PCS, from Lawrence and Weissbock, pGPS by Jeremy Davis, and a draft by numbers approach by Michael Schuckers, each looking to find a good projection of a prospects future value. Last year, I presented my own model, DEV (draft expected value), for this problem; DEV looked at comparable players by junior league primary point production to determine a prospect’s expected value in terms of predicted NHL points per game. In this article, I offer a new model carrying the same name, DEV, but taking a different approach for determining a junior player’s expected value.
The goal of DEV is to arrive at a one number rating for the expected value of a prospect at the time they are drafted. To do this, it's required to know a given prospects chance of becoming an NHL player and how they would likely do if they were to make the NHL. Then, the product of these two metrics can be taken to arrive at our final expected value at the time of the draft.
Once we have these two metrics we will combine them to receive an expected value, called DEV. The equation representing a player's DEV is . This is their value in terms of the point shares per season they are expected to add to the team drafting them.
The data used in developing the NHLc and xPS models is described as follows.
Includes all OHL, QMJHL, and WHL players from the 1997-98 season through the 2007-08 season. This data was scraped from the respective league sites. Player height, weight, age, and position is scraped from the league site or eliteprospects.com in cases where it's unavailable on the league site.
Each player’s NHL Central Scouting ranking is scraped from the archived North American Skaters rankings on thedraftanalyst.com.
Player's NHL games played counts are scraped from eliteprospects.com.
Player's point share data is scraped from hockey-reference.com. When calculating a player's career average point shares per 82 games played rate, negative seasons are set to be 0 point shares and only seasons where a player is 26 or younger are considered. These modifications for point share data were made following the belief that drafting a player with a negative impact in their NHL career is not worse than but equivalent to drafting a player who does not make the NHL full time so their point shares rate should be 0 for our purposes and not negative. Additionally, only seasons prior to age 27 are included in the average so that a player's point share rate is not diluted by including seasons past their athletic prime as well as the fact that players under this age cut off are, for the most part, restricted free agents under the current CBA. This means that the team that drafts them can reasonably expect to receive this much playing time out of the player they select while keeping this player for seasons following this age is less certain.
The model for NHLc is trained using the data described above. This model is a binary
classifier where the target variable is whether a player hits 200 NHL GP or not, with the output
being the probability that the given player will meet this threshold. For forwards, the features
that the model considers when predicting are the players age, height, weight, the league they
played in, whether they played center, left wing, or right wing, their NHL central scouting
ranking, and their production metrics as found on prospect-stats.com. For defense, the features
used are the same as for forwards except for the fact that the model does not consider whether the
player played left or right defense as that data isn't available for all players in the dataset.
All this information is available to teams at the time of the NHL draft meaning that the predicted
value of a player's NHLc can be known prior them being drafted.
To test the validity of our NHLc model we will be using log loss as an evaluation metric and comparing to the log loss of the picks that NHL teams made.
Aside: Log loss will be used to measure the accuracy of the predicted probabilities. In case you aren’t familiar with log loss as a metric this is a brief explanation of log loss.
The formula for the log loss of a set of predictions is where N is the number of instances for which predictions are being made, yi is the observed outcome for instance i, and pi is the predicted outcome for instance i.
For log loss, the best possible score, meaning a perfect classification, is 0 and the worst possible score is 1. To demonstrate the usefulness of log loss when evaluating sets of predictions, consider the following example.
If you predict there’s a 75% chance that it rains Saturday and a 50% chance that it rains Sunday, the log loss for your set of predictions if it rains on both days would be equal to 0.213. If it rained neither day then the log loss of your predictions would be equal to 0.452.
As we can see from this example, the set of predictions with the lower log loss is the set whose predictions were closer to the observed outcome.
When teams make a pick at the NHL draft they don’t announce the percent likelihood they feel that player has of making the NHL, because of this we must come up with an assumed value for their picks to test the efficiency of the selections NHL teams make. By using the historical success rate of NHL picks by position we can fill in this percent likelihood value. Essentially, if a player is selected with pick 26 we argue that the NHL team picking them believes this player has about the same chance of making the NHL as the average player picked at 26th overall does.
With this set of NHL determined probabilities of success we can calculate the log loss of NHL draft selections. Our model is trained on CHL seasons from 1997-98 through 2007-08 and the reported log loss for our model is the mean test log loss of a 5-fold cross validation. The log loss of NHL teams is our calculated log loss based on pick position of this same group of players.
These results indicate that the probability estimates produced by NHLc were closer to the observed truth than the assumed probability of NHL draft positions. This doesn’t mean that NHLc is definitively better at determining a prospect's chance of success than NHL teams, as the values tested against are simply assumed probabilities. However, this does imply that the output of NHLc can be trusted at least as much as NHL draft order can be when projecting a player's chance of making the NHL.
The model for xPS is trained using the data set previously described but limited only
to look at players that did make the NHL full time. This is done since we don't want this model to
consider the possibility that the player doesn't make the NHL as that is already being accounted
for in the final DEV model with the inclusion of NHLc.
For forwards, the features that the model considers for projection are the players age, the league they played in, their NHL central scouting ranking, and their junior league production metrics as found on prospect-stats.com. These features differ from those used for NHLc since height, weight and forward position were not found to be particularly relevant when projecting point share contributions despite having use for NHLc. For defense, the features used by xPS are age, league, and junior production metrics. Unlike with forwards, the inclusion of NHL Central Scouting ranking was not found to improve point share predictions for defenders. As with NHLc, all of these features are available to teams at the time of the NHL draft.
The evaluation metric that the xPS model looks to minimize is the root mean squared error of the prediction set.
Aside: If you aren't familiar with RMSE feel free to read this page from Vernier discussing RMSE.
The results of a 5-fold cross validation produced the following test output for xPS.
|Test RMSE Mean||Test RMSE Standard Deviation|
This means that we can trust our xPS model to project a player's point share rate in the NHL within about 2 points for forwards and defenders.
With our two models, NHLc and xPS, complete we now have the components necessary to
arrive at the final expected value of a prospect. By taking the product of NHLc and xPS for a
prospect, we receive that prospect’s DEV. Thanks to this approach, our final DEV ranking accounts
for a prospect’s chance of making the NHL as well as the significance of the impact they are likely
to have if they do make the NHL.
As an example of the potential application of DEV, let’s consider Nick Ritchie and Robby Fabbri. In the 2014 NHL draft, Nick Ritchie was selected 10th overall by the Anaheim Ducks and Fabbri was later selected 21st by the St. Louis Blues. Much of the appeal of Nick Ritchie at the time of the draft was that he was a “can’t miss” prospect, by selecting him you were very likely to receive a player that will be in your NHL lineup in the future. Fabbri on the other hand was more of a “high risk, high reward” prospect, and this is reflected in their NHLc. Ritchie was given a 71.73% chance to become an NHL player per NHLc while Fabbri had a less impressive 62.92% chance. Looking simply at how likely they were to make the NHL, Ritchie would appear to be the better option. However, looking at their xPS, Ritchie was expected to contribute 3.79 point shares per 82 games compared to Fabbri’s 4.61. This means that despite Ritchie being the “safer” pick since he’s more likely to play in the NHL, he was not expected to make as significant of an impact as Fabbri if they both were to make the NHL. When considering both their NHLc and xPS using DEV, Fabbri ended up with a higher expected value of 2.9 versus Ritchie’s 2.72. This is only one specific example but it helps to show that being able to properly account for both a prospect’s chance of success and their expected impact could lead to better decision making at the NHL draft.
DEV’s final output is in terms of how many point shares per 82 games a prospect is expected to contribute to the team that drafts them. This helps in that players whether they be forward or defense can be directly compared, however there is a slight issue since these units are not intuitively understood. To help make the output more interpretable I’ve matched each player's DEV ranking with the pick position that is historically closest in terms of value, this is called the player’s “Fair Value Pick Position”.
One important thing to note here is that it would not necessarily be a wise decision to use a 15th overall pick on a player whose fair value pick position is the 15th overall pick. Doing this would lead to you receiving only the expected value of the 15th pick position. You haven’t lost any value by making this pick but you also haven’t added any value, you’ve simply maintained value. To gain the maximum value possible from your selections you would look to pick a player whose fair value pick position was earlier than the draft position you are currently using. For example, if you were to use the 15th overall pick to select a player whose fair value pick position is 10th overall you can expect to receive 0.45 additional point shares more than the historical average for a 15th overall pick.
DEV is by no means a perfect ranking of a prospect's value. There will be players
throughout the years that DEV values highly who turn out to be busts and there will be players that
DEV ranks lowly who become stars. However, there is reason to believe that by using DEV or similar
draft projection models a team can on average, over a substantial period of time, receive more
value through the draft than a typical NHL team picking at those same positions would. DEV is not a
finished product and there are many more improvements I’m looking to make in the coming years, yet
it still has value in its current iteration.
My hopes are simply that I’ve shown there is potential in DEV and other draft projection models. To draft well a team needs a solid ranking of the hundreds of prospects that are available to be drafted and research has shown time and time again that humans alone struggle with ordering sets of this size optimally. Statistical models can aid traditional scouting in developing these rankings and arriving at a more optimal approach than has previously been used.
The full rankings for NHLc, xPS, and DEV for drafted players from the 2015-16 season are available here. Following the conclusion of the 2017 NHL draft, the full rankings of NHLc, xPS, and DEV for the 2016-17 season will also be made available. If you have any questions or concerns, please feel free to reach out to me on twitter @3Hayden2 or through email at 3Hayden2@gmail.com