For the past three seasons I've been making models that aim to project player value of those eligible at the time of the NHL entry draft. The first iteration of this model was very simple and frankly at this point I'm convinced that it didn't offer much actionable value. Last year I replaced that deeply flawed first iteration with an entirely new model which you can read about here. For this draft I've again updated the model but the changes for this year's version aren't nearly as drastic as the changes from version one to two. The main concept behind the model remains the same, it's just some of the finer details that I've modified.
The model is still using two gradient boosted decision trees, one for calculating NHLc and one for calculating xPS, then multiplying the two outputs to find DEV, draft expected value. However, since another season has passed I've added one more season to the training dataset that the model uses. I've also done some additional feature engineering to give the model the ability to use variables it didn't previously consider. The most important features that I've added to the model this year were OWS, DWS, and QoT. OWS and DWS stand for Offensive Win Shares and Defensive Win Shares respectively; these are modified version of Hockey-Reference's Offensive and Defensive Point Shares that I've tweaked to work better with the limitations that exist for CHL data. I'll have a separate post at a later date discussing my changes to the Point Shares formula.
These changes to the model helped quite a bit in terms of the model's final performance. Below I've listed the evaluation metrics for NHLc and xPS comparing this year's model to last year's.
|Forwards||Root Mean Square Error|
|Defense||Root Mean Square Error|
As you can see the new NHLc model has a lower log loss than last year's and this year's xPS has a lower root mean square error on our out of sample tests.
I've moved the DEV results from last season's model here mainly just to maintain a log of it but I wouldn't recommend paying much attention to those results as the model that made them is now out of date.
The full results for the up to date model can be found here. You can view the results for all seasons from 2014-15 through 2017-18.
Looking through the results I'm sure you'll find some things that are confusing so I'd like to mention a few notes that will help for interpreting the outputs of the model.
Looking at the pick value you may notice that very few player's pick value is in the top three. Also, only one player from 2014-15 through 2017-18 has a pick value of first overall, that being Connor McDavid. The reason for this is that historically players selected in the top three are almost guaranteed to hit the 200 NHL games played threshold, yet the NHLc model is more conservative in its estimates. Because of this it makes pick values in the top three of the draft tough to achieve for a player even though they may clearly be among the top three best available players of their draft class.
To draft optimally you would want to draft players with a pick later in the draft than their pick value. The pick value is based on the historical value of that pick position so if with the 45th pick in the draft you select a player whose pick value is 45 you haven't gained any value with that pick. That selection is consistent with the history of 45th overall picks so you can't expect to see any improvements over traditional methods of player evaluation. Ideally, you would only select players with a pick value earlier than the pick you are using to select them, so for example rather than using pick 45 to select a player with pick value of 45 you would select a player with pick value of 30 and by doing that you have added value equivalent to the difference between the historical value of picks 45 and 30 in the draft. An obvious exception to this is when you are selecting at the very start of the draft, as mentioned before you will very rarely find a player whose pick value is judged to be at those spots.
The outputs for this model are based on players drafted from 1997 through 2010, so many of those players were working to make it into and play in an NHL that's different than the NHL of the present and the NHL of the future. I feel this is especially relevant when it comes to looking at the outputs of NHLc. Since we are starting to see size have a lesser impact on players making the NHL, it is likely a safe assumption that NHLc for smaller players underestimates their true chance of making the NHL since a player of their caliber would likely have an easier time making the NHL five years from now than a player of their caliber looking to make the NHL in 2003. Unfortunately, that adjustment for the effects of a changing NHL needs to be made mentally since NHLc doesn't currently adjust for these factors. This is the largest flaw I see in the current DEV model that I'm looking to address for future versions.