## An initial disclaimer:

This piece is for discussion. Statistical operations can be tricky and there can be a number of ways to do things. I am not claiming to be right or wrong on anything, *yet*. If you have some advice, please provide comment.

## Round 2

So, after my article on the Caps picking up Grabovski and me not thinking it was as big of a deal as others were making it, the response was brutal. I take some credit for that by putting out an unpolished piece. In the end, I stand by my argument that the idea Grabovski would go from a career 45-50 point scorer to a 60-70 point guy was hyperbole.

Some people discussed how his Corsi and Fenwick ratings, and that Washington had a lot more offensive zone faceoffs than Toronto (which should lead to more chances), would make him an improvement over Ribeiro. I basically argued that despite the improved advanced stats, it seemed crazy that any one person’s numbers would jump that high; thus, the Caps roster is at a net loss without Ribeiro, add Grabo.

To that end, I wanted to examine this further. Here is my idea: the better Corsi, Fenwick and offensive zone faceoffs a team has, under the “Grabovski hypothesis”, should lead to more team goals (he manes his teammates better argument). If this is true, we should be able to perform a linear regression and see how a variety of statistics effect the number of goals a team scores (goals for). In other words, I wanted to see what happens when we regress a team’s “goals for” for a season (y-variable) on a set of variables, including those mentioned above (X-set).

Thus, I went to stats.hockeyanalysis.com and grabbed team stats for all teams from the 2007-2008 seasons through the last season. I added all of HA’s data (see legend below) and added some dummy variable, which is common when analyzing panel data.

### Legend

TOI = Time on ice

GF = Goals For

GA = Goals Against

GF60 = Goals For per 60 minutes of ice time

GA60 = Goals Against per 60 minutes of ice time

GF% = Goals For percentage = 100* GF / (GF + GA)

SF = Shots For

SA = Shots Against

SF60 = Shots For per 60 minutes of ice time

SA60 = Shots Against per 60 minutes of ice time

SF% = Shots For percentage = 100* SF / (SF + SA)

FF = Fenwick For

FA = Fenwick Against

CF = Corsi For

CA = Corsi Against

Sh% = Shooting Percentage

Sv% = Save Percentage

OZFO% = Percentage of face offs that took place in the offensive zone

DZFO% = Percentage of face offs that took place in the defensive zone

Items in red are in the data table, but were not used in the regression so there weren’t correlation issues between the x-variables.

### Dummy Variables

east – Eastern Conference (0=No, 1=yes)

west – Western Conference (0=No, 1=yes)

yr** – year dummy for the year the data was taken (0 = not year **, 1 = year**) – one dummy variable for each of the six years

## Results

Looking from the 2007-2008 season through the 2012-2013 season, the regression results only showed statistically significant results (at the 0.05 level) for shooting percentage and shots for (see “regressions results with lockout year” below).

I thought maybe the lockout-shortened season last year might have messed with things a bit, so I removed it and ran it again. The only thing statistically significant again is shooting percentage and shots for. Fenwick-for and Corsi-for are statistically significant at the 0.1 level, which is usually not accepted. Let’s say we do accept the stats at this level. A team would gain 1.7 goals per season for every additional 1,000 Corsi-for, or 1,000 shots directed at the net, or an one goal per season for every 333 additional Fenwick-for or 333 shots directed at the net (excluding blocked shots).

## Grabovski

If I did this correctly, then only those old-fashioned statistics of shots on goal and shooting percentage matter how many times a team scores. Offensive zone faceoff percentage does not matter. Corsi and Fenwick are not statistically significant. Even so, Grabovski and his improvement on other players would have to add 1,000 shots directed at the net to gain an additional 1.7 goals per season (or 333 shots not including blocks).

This does not say whether or not Grabovski will be better or worse than Ribeiro. But, as it stands, Grabovski’s addition to the team based on the advanced stats do not have a statistically significant affect. What will matter? If he can get people the puck to score at a high percentage or put a lot more pucks on net, unblocked. We know he is not an assist guy, so I think it can be deduced that he will not likely raise the shooting percentage for others (give them good chances). Ribeiro on the other hand is a distributor based on his higher assist numbers throughout his career.

With the regression, as it is, I think my argument stands….the Washington Capitals roster is worse minus Ribeiro, plus Grabovski. The boys still have to play this out on the ice….

All files and R script are available upon request.

## Details

Regression results with lockout year.

Estimate Std. Error t value Pr(>|t|)

(Intercept) -9.013e+01 2.739e+01 -3.290 0.00123 **

SF 7.776e-02 7.542e-03 10.310 < 2e-16 ***

SA 9.466e-03 7.945e-03 1.191 0.23526

FF -4.715e-03 8.897e-03 -0.530 0.59690

FA -3.381e-03 7.771e-03 -0.435 0.66408

CF 2.919e-03 4.285e-03 0.681 0.49663

CA -7.803e-04 3.323e-03 -0.235 0.81466

Sh. 1.614e+01 3.087e-01 52.269 < 2e-16 ***

Sv. -3.527e-01 2.997e-01 -1.177 0.24093

OZFO. 7.607e-02 2.656e-01 0.286 0.77492

DZFO. -3.457e-01 2.135e-01 -1.619 0.10732

east 1.101e+00 1.352e+00 0.815 0.41634

west 1.385e+00 1.438e+00 0.964 0.33663

yr13 6.808e-01 2.779e+00 0.245 0.80681

yr12 1.468e-01 1.141e+00 0.129 0.89776

yr11 1.178e-02 1.138e+00 0.010 0.99176

yr10 4.663e-01 9.894e-01 0.471 0.63808

yr9 1.853e-02 8.647e-01 0.021 0.98293

yr8 NA NA NA NA

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Regression results without lockout season.

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.444e+02 1.295e+01 -11.152 <2e-16 ***

SF 8.331e-02 3.073e-03 27.110 <2e-16 ***

SA -2.951e-03 3.277e-03 -0.900 0.3696

FF -6.418e-03 3.604e-03 -1.781 0.0773 .

FA 3.601e-03 3.134e-03 1.149 0.2525

CF 2.929e-03 1.721e-03 1.702 0.0911 .

CA -1.713e-03 1.354e-03 -1.266 0.2078

Sh. 1.826e+01 1.418e-01 128.81 <2e-16 ***

Sv. 6.188e-02 1.359e-01 0.455 0.6497

OZFO. -1.117e-01 1.214e-01 -0.920 0.3592

DZFO. -6.648e-02 9.268e-02 -0.717 0.4744

east -2.084e-01 5.782e-01 -0.360 0.7191

west -2.127e-01 6.114e-01 -0.348 0.7285

yr13 NA NA NA NA

yr12 5.828e-01 4.596e-01 1.268 0.2071

yr11 4.903e-01 4.583e-01 1.070 0.2866

yr10 5.869e-01 3.926e-01 1.495 0.1373

yr9 1.635e-01 3.373e-01 0.485 0.6286

yr8 NA NA NA NA

Tags: Blog, hockey, sports statistician, Sports statistics, twitter