xG: Expected Goals

June 9, 2018 | Hayden Speak

 Introduction

  If you've been following analytics in the NHL over the past few years you've likely heard of xG, expected goals. There are many xG models for the NHL(Alan Ryder, Ken Krzywicki, Brian Macdonald, DTMAboutHeart, MoneyPuck, Crowd Scout Sports, Corsica, XtraHockeyStats) but so far there have been none for minor league hockey. Here I present my expected goals model for the OHL and QMJHL and walk through the value of that model.

 Data

  The OHL and the QMJHL are currently the only two leagues tracked on this site that record the location where shots were taken during the game. The QMJHL began tracking this data for the 2012-13 season and the OHL followed suit beginning in 2015-16. This gives us six QMJHL seasons and three OHL seasons for us to build our model from. The model uses shot location and a few other contextual factors for predicting the probability of a goal.

Factors

X Location
Y Location

  X and Y locations were adjusted according to the process described here.

Shot Distance

  Distance that the (X, Y) coordinate is from the center of the goal line.

Field Of View

  The angle ABC where A = coordinate of the left post, B = coordinate of the shot, C = coordinate of the right post. This tells us how narrow of a window there was for the puck to make it on net.

Time Since Last Shot
If Last Shot Was By The Same Team

  This factor works well in conjunction with Time Since Last Shot. The combination of these two factors allows for finding shots that were likely rebounds (little time elapsed since last shot and last shot was by this team) as well as shots that could have come off of a rush (little time elapsed since last shot and last shot was not by this team).

Number of Opponents On The Ice
Number of Teammates On The Ice
League
Shooter's Previous Season Shooting Strength

  Used for gauging the scoring ability of the player who's taking the shot. This factor is like the Shot Multiplier factor in DTMAboutHeart's xG model.

Shooter's Age

  In junior hockey, age has a slight impact on shooting percentage so it is included for the model.

Shooter's Position
Shooter's Handedness

  This factor is primarily useful for determining whether a player is shooting from their off wing or not. Shooting right handed from the left side of the ice has a higher probability of scoring than shooting right handed from the right side of the ice.


  These factors are used in a gradient boosted decision tree to build a classifier that tells the probability of a shot becoming a goal, which we call xG. A ten fold cross validation was used to determine the xG values of the shots in our dataset.

 Results

Players

Descriptive

  For individual players xG describes goal scoring better than shot generation does. At 5v5, shot generation explains about 60% of the variance of goal production while xG generation explains about 76%.

Predictive

  To determine the predictive value of xG for players I looked at different points in the season how their exG/60 to date predicted the eG/60 that they would have in the remaining games of that season. I also compared that to the predictive ability of eG/60 and eSh/60. As you can see, at all points in the season exG/60 is more predictive of future individual goal scoring than eG/60 or eSh/60 both in the OHL and the QMJHL.

R2 For Each Metric
Game 5 Game 10 Game 15 Game 20 Game 25 Game 30 Game 35 Game 40 Game 45 Game 50 Game 55
exG/60 0.361 0.472 0.516 0.534 0.526 0.527 0.507 0.492 0.44 0.414 0.362
eG/60 0.179 0.268 0.336 0.391 0.409 0.418 0.426 0.415 0.373 0.374 0.327
eSh/60 0.228 0.331 0.374 0.396 0.398 0.402 0.394 0.389 0.343 0.321 0.289
R2 For Each Metric
Game 5 Game 10 Game 15 Game 20 Game 25 Game 30 Game 35 Game 40 Game 45 Game 50 Game 55
exG/60 0.338 0.466 0.503 0.517 0.508 0.512 0.487 0.469 0.427 0.413 0.341
eG/60 0.185 0.263 0.329 0.366 0.399 0.415 0.416 0.428 0.394 0.385 0.306
eSh/60 0.198 0.308 0.353 0.357 0.36 0.362 0.353 0.347 0.319 0.293 0.226
R2 For Each Metric
Game 5 Game 10 Game 15 Game 20 Game 25 Game 30 Game 35 Game 40 Game 45 Game 50 Game 55
exG/60 0.373 0.476 0.522 0.544 0.537 0.535 0.518 0.505 0.449 0.416 0.381
eG/60 0.176 0.271 0.34 0.405 0.414 0.42 0.432 0.408 0.36 0.366 0.345
eSh/60 0.244 0.342 0.384 0.418 0.419 0.425 0.419 0.413 0.359 0.34 0.34

Teams

Descriptive

  At the team level I tested the ability of xGF% to describe GF% compared to SF%. Since these leagues don't have Corsi available the next best thing is SF%. Our xG model explains around 80% of the variance of GF% while SF% in the OHL explains 56% and in the QMJHL about 74%.

Predictive

  The predictive value of xG for teams was determined by using xGF%, GF%, and SF% to predict future GF% at 5v5. Here we see that xGF% outperforms GF% and SF% at every point during the season in the OHL however that's not true in the QMJHL. In the QMJHL we see that xGF% builds up predictive ability faster than GF% and SF% and around the 15-game point in the season xGF% is considerably stronger than alternatives at predicting future GF%. But, as the season continues we see GF% becoming more predictive than xGF%, from about game 25 until game 50.

R2 For Each Metric
Game 5 Game 10 Game 15 Game 20 Game 25 Game 30 Game 35 Game 40 Game 45 Game 50 Game 55
xGF% 0.21 0.341 0.426 0.446 0.441 0.461 0.473 0.48 0.527 0.488 0.46
GF% 0.12 0.231 0.309 0.403 0.388 0.438 0.444 0.47 0.552 0.472 0.405
SF% 0.163 0.299 0.373 0.401 0.429 0.432 0.44 0.458 0.488 0.454 0.422
R2 For Each Metric
Game 5 Game 10 Game 15 Game 20 Game 25 Game 30 Game 35 Game 40 Game 45 Game 50 Game 55
xGF% 0.321 0.439 0.42 0.492 0.539 0.559 0.599 0.604 0.63 0.588 0.535
GF% 0.179 0.296 0.285 0.392 0.352 0.385 0.452 0.506 0.598 0.555 0.474
SF% 0.241 0.293 0.397 0.397 0.516 0.521 0.537 0.502 0.512 0.429 0.394
R2 For Each Metric
Game 5 Game 10 Game 15 Game 20 Game 25 Game 30 Game 35 Game 40 Game 45 Game 50 Game 55
xGF% 0.173 0.307 0.429 0.428 0.402 0.422 0.425 0.437 0.496 0.456 0.432
GF% 0.102 0.206 0.32 0.411 0.412 0.468 0.444 0.461 0.545 0.446 0.38
SF% 0.138 0.303 0.366 0.402 0.395 0.397 0.403 0.443 0.481 0.464 0.432

Summary

At the player level our current xG model is both more descriptive and more predictive of goal production than the previously available alternatives. At the team level it is more descriptive than alternatives while its predictive ability varies at different points throughout the season. Given how the model was developed we should expect that it performs better at the player level than the team level. Also, despite there being situations where other metrics outperform it in predicting future team success I still feel that xG carries value in team level analysis. Since xG requires fewer samples to gain predictive significance than alternatives, analysis of a team's ability can occur earlier in the season while there is still time to make changes. All data related to xG has been made available on this site for you to explore further and use going forward. On the player tabs you will find xG, xG/GP, and exG/60. On the team tables there is now xGF, xGA, xGD, xGF%, xSh%, xSv%, xPDO, GFAX, GSAX, GFAX/30, and GSAX/30. And finally, on the goalie tables there is xGA, xSv%, dSv%, GSAX, and GSAX/30. If any of those acronyms mean nothing to you the glossary contains definitions for all stats available on the site. If you have any questions or comments about xG please let me know by tweeting or messaging me @3Hayden2 or by sending an email to 3Hayden2@gmail.com.