article assumes that you have read the previous article "Introducing DEV", if you haven't read that article
please do before continuing with this article.
DEV is simple in concept but can be seen as complicated upon first glance, so to try to explain it fully while keeping everything understandable I'll be walking you through the creation of DEV step by step. First up, gathering the data.
Gathering the Data:
In order to make a model that optimally finds and evaluates comparables you need more data than can be found on any currently running sites. Games played, goals, assists, points, +/-, and PIM leaves a lot to be desired when you're trying to get the most accurate representation of a prospect's skillset. Because I wanted more in depth stats than any site had posted the only solution was to scrape the game sheets for past seasons in the OHL, QMJHL, and WHL. This is the root of DEV's main issues; junior league sites don't have a great backlog of game sheets so the data set is unfortunately limited. For the OHL I was able to scrape all seasons from 1997-98 through 2014-15. The QMJHL site however only has back to 2010-11 and outside of that the only practically usable QMJHL game sheets only went back to 2006-07 so the QMJHL data set is scarcer than the other CHL leagues. Finally, for the WHL I was able to scrape from 1996-97 through 2014-15. In short, the data set includes all players from the following leagues and seasons:
OHL: 1997-98 to Present
QMJHL: 2006-07 to Present
WHL: 1996-97 to Present
Now that we have our data set we face the question of what stat in this data set can best predict future NHL success.
Predicting NHL Success Using Junior Stats:
Before we can find comparable players we need to find the stat that best predicts NHL success, once that's found we will know what variable would be best to use for deciding who a players cohorts are. The process of this is nothing more than trying everything and selecting what works the best. In the end the best predictor turned out to be Era, League, and Age Adjusted Primary Points per Game. Now let's break down how that's calculated.
points are chosen over points because they are a better predictor of NHL
success than regular points. Secondary assists were found to be not repeatable
and so including them harms the predictive ability of the model. Next we looked
into adjusting for age, Corey Pronman recently wrote about the importance
of accounting for a prospect's age and after modifying the CHL age adjustment
developed by Rhys Jessop we arrived at an age adjustment for
CHL primary points that predicted NHL success better than primary point
production that wasn't adjusted for the player's age. The era and league
adjustment happen in the same step, we determine marginal goals per season, as
described in this article, and then normalize the production of players so
that our model doesn't over or undervalue players depending on whether they
played in a high scoring era/league or not. In the end we have our primary
points per game rate that has been adjusted for the player's era, league, and
age. This final stat is the best available single variable predictor of a
prospect's future NHL P/GP rate.
Finding Comparable Players:
To determine if player Y is comparable to player X they must pass 6 tests.
1. If player X is in their first season of draft eligibility player Y must be as well.
2. If player X plays forward then player Y must also play forward, and vice versa.
3. Player X and player Y's height can differ by no more than one inch.
4. Player Y must have played at least 30 games in the season being compared.
5. Player X and player Y's era, league, and age adjusted primary points per game rates must be within 0.11975(half of a standard deviation) of each other.
6. Player Y must have been drafted prior to or at the 2012 draft. However, if they were drafted after this but have hit 200 NHL GP they pass this test regardless.
Using Comparable Players to Determine Expected Value:
After creating a list of comparable players for our prospect we next want to determine how likely it that this prospect will make the NHL. To do this we will look at how many of the prospect's comparable players were able to make it to the NHL full time (200 or more NHL GP) compared to the total number of comparable players. We will use this percentage of comparable players that were successful as the percent chance that the prospect makes the NHL full time. The next thing we'd like to know is assuming that the prospect makes the NHL full time, what will their P/GP rate likely be?
In order to find this we will look at the production rates of the prospects successful comparable players. One way of determining the prospects expected production would be to average these values out and use that however this can be improved upon by having the more comparable players weighted higher and impacting the final result more than the others. To do this we will develop similarity scores for the prospects to determine just how similar the prospect is to each of their comparables. The formula for generating similarity scores is described here by Emmanuel Perry. For our similarity score calculation we input variables that will help isolate certain playing styles, this is so that pure goal scorers don't impact a play maker's expected production rate as much as another play maker would, same goes for powerplay specialists compared to even strength producers. Once we have our similarity scores for the comparable players we will use these similarity scores as our weights when calculating the weighted arithmetic mean of the successful comparables NHL production. This weighted mean is the expected NHL P/GP of the prospect assuming that they make the NHL full time.
Now that we know the prospect's percent chance of making the NHL full time and their expected NHL P/GP if they make the NHL full time we can simply multiply these together to find the prospect's expected value. Finally, we match this expected value with the expected value of each draft pick position, as found by Stephen Burtch, and we have arrived at what draft position would be fair value for selecting the given prospect. What this means is that if the prospect's expected value correlates with a pick in range 8 - 10 and they're selected with the #3 overall pick we can say that this selection is expected to have a negative return on investment; a pick with an expected value of 0.6 was used to select a player with an expected value of 0.31. Conversely, if this player was selected with the #27 pick we can say that this selection is expected to have a positive return on investment; a pick with an expected value of 0.2 was used to select a player with an expected value of 0.31.
DEV in Action:
In the 2015 draft, Daniel Sprong was picked #46 overall by the Pittsburgh Penguins. Sprong is a perfect example of a player with positive expected value according to DEV; while Sprong was picked 46th his expected value is that of a pick in range #4 - #7 overall. DEV found 27 comparables for Sprong and 19 of these 27 were successful, giving Sprong a 70.37% chance to become an NHL success. Then based on his comparables DEV expects Sprong to produce 0.56 P/GP in the NHL over his career based on the weighted mean of his comparables. Below is Sprong's complete list of successful comparables with their similarity scores, which are used as weights in the P/GP calculation for Sprong. For example, Dustin Brown's NHL production is weighted the highest since he's the most similar to Sprong and Devin Setoguchi's is weighted the least.
And these are Sprong's not successful NHL comparables. Ryan Strome is included in this list because he has yet to hit 200 NHL GP although he likely will, you could choose to include him as a successful comparable which gives Sprong a 74.07% chance to make the NHL.
This is one example where DEV found a player highly likely to make and then succeed in the NHL despite still being available in the middle of the second round. Of course we aren't arguing that DEV should be followed to a tee when making draft selections but it's undeniable that the evidence was there to support selecting Sprong significantly higher than he was chosen. At this point you may remember the previous article where it was noted that DEV doesn't always get it right as evidenced by its listing Greg McKegg as worth a top 15 pick, so how can we know if DEV is right about Sprong or not? Fortunately, looking past simply draft year stats could help point out that McKegg's high performing draft year was a fluke whereas Sprong's likely was not. The year before McKegg's draft year DEV determined that he would be worth selecting with a pick in range 168 - 204; however with Sprong's season prior to his draft year DEV found him to be worth a pick in range 4 - 7. While McKegg's expected value jumped significantly, Sprong's stayed consistent showing that Sprong's evaluation of a pick from 4 - 7 is likely a more accurate estimate than McKegg's draft year DEV result.
Cases like this are where DEV shows it's potential, while two years of data supported that Sprong has the potential of a high end draft pick he was passed on until the middle of the second round; and now less than a year after the draft he has already proven that his actual value is closer to his DEV ranking than to his actual draft position. Soon we will be posting another article going more in depth into some limitations and different applications of DEV when identifying draft targets.
Note: Listed NHL stats are as of 3/14/16