An initial disclaimer:
This piece is for discussion. Statistical operations can be tricky and there can be a number of ways to do things. I am not claiming to be right or wrong on anything, yet. If you have some advice, please provide comment.
Round 2
So, after my article on the Caps picking up Grabovski and me not thinking it was as big of a deal as others were making it, the response was brutal. I take some credit for that by putting out an unpolished piece. In the end, I stand by my argument that the idea Grabovski would go from a career 45-50 point scorer to a 60-70 point guy was hyperbole.
Some people discussed how his Corsi and Fenwick ratings, and that Washington had a lot more offensive zone faceoffs than Toronto (which should lead to more chances), would make him an improvement over Ribeiro. I basically argued that despite the improved advanced stats, it seemed crazy that any one person’s numbers would jump that high; thus, the Caps roster is at a net loss without Ribeiro, add Grabo.
To that end, I wanted to examine this further. Here is my idea: the better Corsi, Fenwick and offensive zone faceoffs a team has, under the “Grabovski hypothesis”, should lead to more team goals (he manes his teammates better argument). If this is true, we should be able to perform a linear regression and see how a variety of statistics effect the number of goals a team scores (goals for). In other words, I wanted to see what happens when we regress a team’s “goals for” for a season (y-variable) on a set of variables, including those mentioned above (X-set).
Thus, I went to stats.hockeyanalysis.com and grabbed team stats for all teams from the 2007-2008 seasons through the last season. I added all of HA’s data (see legend below) and added some dummy variable, which is common when analyzing panel data.
Legend
TOI = Time on ice
GF = Goals For
GA = Goals Against
GF60 = Goals For per 60 minutes of ice time
GA60 = Goals Against per 60 minutes of ice time
GF% = Goals For percentage = 100* GF / (GF + GA)
SF = Shots For
SA = Shots Against
SF60 = Shots For per 60 minutes of ice time
SA60 = Shots Against per 60 minutes of ice time
SF% = Shots For percentage = 100* SF / (SF + SA)
FF = Fenwick For
FA = Fenwick Against
CF = Corsi For
CA = Corsi Against
Sh% = Shooting Percentage
Sv% = Save Percentage
OZFO% = Percentage of face offs that took place in the offensive zone
DZFO% = Percentage of face offs that took place in the defensive zone
Items in red are in the data table, but were not used in the regression so there weren’t correlation issues between the x-variables.
Dummy Variables
east – Eastern Conference (0=No, 1=yes)
west – Western Conference (0=No, 1=yes)
yr** – year dummy for the year the data was taken (0 = not year **, 1 = year**) – one dummy variable for each of the six years
Results
Looking from the 2007-2008 season through the 2012-2013 season, the regression results only showed statistically significant results (at the 0.05 level) for shooting percentage and shots for (see “regressions results with lockout year” below).
I thought maybe the lockout-shortened season last year might have messed with things a bit, so I removed it and ran it again. The only thing statistically significant again is shooting percentage and shots for. Fenwick-for and Corsi-for are statistically significant at the 0.1 level, which is usually not accepted. Let’s say we do accept the stats at this level. A team would gain 1.7 goals per season for every additional 1,000 Corsi-for, or 1,000 shots directed at the net, or an one goal per season for every 333 additional Fenwick-for or 333 shots directed at the net (excluding blocked shots).
Grabovski
If I did this correctly, then only those old-fashioned statistics of shots on goal and shooting percentage matter how many times a team scores. Offensive zone faceoff percentage does not matter. Corsi and Fenwick are not statistically significant. Even so, Grabovski and his improvement on other players would have to add 1,000 shots directed at the net to gain an additional 1.7 goals per season (or 333 shots not including blocks).
This does not say whether or not Grabovski will be better or worse than Ribeiro. But, as it stands, Grabovski’s addition to the team based on the advanced stats do not have a statistically significant affect. What will matter? If he can get people the puck to score at a high percentage or put a lot more pucks on net, unblocked. We know he is not an assist guy, so I think it can be deduced that he will not likely raise the shooting percentage for others (give them good chances). Ribeiro on the other hand is a distributor based on his higher assist numbers throughout his career.
With the regression, as it is, I think my argument stands….the Washington Capitals roster is worse minus Ribeiro, plus Grabovski. The boys still have to play this out on the ice….
All files and R script are available upon request.
Details
Regression results with lockout year.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.013e+01 2.739e+01 -3.290 0.00123 **
SF 7.776e-02 7.542e-03 10.310 < 2e-16 ***
SA 9.466e-03 7.945e-03 1.191 0.23526
FF -4.715e-03 8.897e-03 -0.530 0.59690
FA -3.381e-03 7.771e-03 -0.435 0.66408
CF 2.919e-03 4.285e-03 0.681 0.49663
CA -7.803e-04 3.323e-03 -0.235 0.81466
Sh. 1.614e+01 3.087e-01 52.269 < 2e-16 ***
Sv. -3.527e-01 2.997e-01 -1.177 0.24093
OZFO. 7.607e-02 2.656e-01 0.286 0.77492
DZFO. -3.457e-01 2.135e-01 -1.619 0.10732
east 1.101e+00 1.352e+00 0.815 0.41634
west 1.385e+00 1.438e+00 0.964 0.33663
yr13 6.808e-01 2.779e+00 0.245 0.80681
yr12 1.468e-01 1.141e+00 0.129 0.89776
yr11 1.178e-02 1.138e+00 0.010 0.99176
yr10 4.663e-01 9.894e-01 0.471 0.63808
yr9 1.853e-02 8.647e-01 0.021 0.98293
yr8 NA NA NA NA
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Regression results without lockout season.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.444e+02 1.295e+01 -11.152 <2e-16 ***
SF 8.331e-02 3.073e-03 27.110 <2e-16 ***
SA -2.951e-03 3.277e-03 -0.900 0.3696
FF -6.418e-03 3.604e-03 -1.781 0.0773 .
FA 3.601e-03 3.134e-03 1.149 0.2525
CF 2.929e-03 1.721e-03 1.702 0.0911 .
CA -1.713e-03 1.354e-03 -1.266 0.2078
Sh. 1.826e+01 1.418e-01 128.81 <2e-16 ***
Sv. 6.188e-02 1.359e-01 0.455 0.6497
OZFO. -1.117e-01 1.214e-01 -0.920 0.3592
DZFO. -6.648e-02 9.268e-02 -0.717 0.4744
east -2.084e-01 5.782e-01 -0.360 0.7191
west -2.127e-01 6.114e-01 -0.348 0.7285
yr13 NA NA NA NA
yr12 5.828e-01 4.596e-01 1.268 0.2071
yr11 4.903e-01 4.583e-01 1.070 0.2866
yr10 5.869e-01 3.926e-01 1.495 0.1373
yr9 1.635e-01 3.373e-01 0.485 0.6286
yr8 NA NA NA NA
Tags: Blog, hockey, sports statistician, Sports statistics, twitter