Tag Archives: Sports statistics

KHL Statistical Power Rankings Explanation

10 Oct

I developed a statistics-based power ranking that will be a weekly feature at EuroHockey.com.  The idea was to come up with a system similar to the BCS ranking for (American) College Football (but less complicated).  Here is the formula and then a part-by-part explanation.

Formula by Team

∑(goal differential per match x opponent points) = RAW

I guess I could write that more formally, but basically here is how it goes.  For each game, I determine the goal differential.  So, If a game is 3-2, then there is a goal differential of 1.  The winning team will get a 1 in the cell for that game.  The losing team will get a 0.

Next, the point differential is multiplied by the number of points the team has in the standings.  Say in the scenario above that each team has 15 points in the standings.  Then the one goal differential is multiplied by 15 and the winning team receives 15 points for that game.  The losing team has 15 multiplied by zero, so teams get no points for the loss.  The totals for all games played are added together for the RAW score.

This means a couple of things.  First, the losing team is not penalized for losing.  Second, the winning team does receive an incentive by beating a team by a larger point margin.  However, just running up the score and not playing defense will not help a team in these rankings, because it is not goals scored, but goal differential.

Overtime and shutout wins are considered indirectly by multiplying these totals by the point standings.  Beating an opponent by the biggest differential who has the highest point standings will give a team the most points for a game.  Beating a lesser opponent is less significant.

The RAW score is adjusted by dividing the number of games played (GP), which gives the “Points Ranking”.

I hope this makes sense and you enjoy the KHL Statistical Power Rankings.  The first edition is here.


More on Grabovski – Do advanced stats say anything about a team scoring goals?

28 Aug

An initial disclaimer:

This piece is for discussion.  Statistical operations can be tricky and there can be a number of ways to do things.  I am not claiming to be right or wrong on anything, yet.  If you have some advice, please provide comment.

Round 2

So, after my article on the Caps picking up Grabovski and me not thinking it was as big of a deal as others were making it, the response was brutal.  I take some credit for that by putting out an unpolished piece.  In the end, I stand by my argument that the idea Grabovski would go from a career 45-50 point scorer to a 60-70 point guy was hyperbole.

Some people discussed how his Corsi and Fenwick ratings, and that Washington had a lot more offensive zone faceoffs than Toronto (which should lead to more chances), would make him an improvement over Ribeiro.  I basically argued that despite the improved advanced stats, it seemed crazy that any one person’s numbers would jump that high; thus, the Caps roster is at a net loss without Ribeiro, add Grabo.

To that end, I wanted to examine this further.  Here is my idea:  the better Corsi, Fenwick and offensive zone faceoffs a team has, under the “Grabovski hypothesis”, should lead to more team goals (he manes his teammates better argument).  If this is true, we should be able to perform a linear regression and see how a variety of statistics effect the number of goals a team scores (goals for).  In other words, I wanted to see what happens when we regress a team’s “goals for” for a season (y-variable) on a set of variables, including those mentioned above (X-set).

Thus, I went to stats.hockeyanalysis.com and grabbed team stats for all teams from the 2007-2008 seasons through the last season.  I added all of HA’s data (see legend below) and added some dummy variable, which is common when analyzing panel data.


TOI = Time on ice
GF = Goals For
GA = Goals Against
GF60 = Goals For per 60 minutes of ice time
GA60 = Goals Against per 60 minutes of ice time
GF% = Goals For percentage = 100* GF / (GF + GA)
SF = Shots For
SA = Shots Against
SF60 = Shots For per 60 minutes of ice time
SA60 = Shots Against per 60 minutes of ice time
SF% = Shots For percentage = 100* SF / (SF + SA)
FF = Fenwick For
FA = Fenwick Against
CF = Corsi For
CA = Corsi Against
Sh% = Shooting Percentage
Sv% = Save Percentage
OZFO% = Percentage of face offs that took place in the offensive zone
DZFO% = Percentage of face offs that took place in the defensive zone

Items in red are in the data table, but were not used in the regression so there weren’t correlation issues between the x-variables.

Dummy Variables

east – Eastern Conference (0=No, 1=yes)

west – Western Conference (0=No, 1=yes)

yr** – year dummy for the year the data was taken (0 = not year **, 1 = year**) – one dummy variable for each of the six years


Looking from the 2007-2008 season through the 2012-2013 season, the regression results only showed statistically significant results (at the 0.05 level) for shooting percentage and shots for (see “regressions results with lockout year” below).

I thought maybe the lockout-shortened season last year might have messed with things a bit, so I removed it and ran it again.  The only thing statistically significant again is shooting percentage and shots for.  Fenwick-for and Corsi-for are statistically significant at the 0.1 level, which is usually not accepted.  Let’s say we do accept the stats at this level.  A team would gain 1.7 goals per season for every additional 1,000 Corsi-for, or 1,000 shots directed at the net, or an one goal per season for every 333 additional Fenwick-for or 333 shots directed at the net (excluding blocked shots).


If I did this correctly, then only those old-fashioned statistics of shots on goal and shooting percentage matter how many times a team scores.  Offensive zone faceoff percentage does not matter.  Corsi and Fenwick are not statistically significant.  Even so, Grabovski and his improvement on other players would have to add 1,000 shots directed at the net to gain an additional 1.7 goals per season (or 333 shots not including blocks).

This does not say whether or not Grabovski will be better or worse than Ribeiro.  But, as it stands, Grabovski’s addition to the team based on the advanced stats do not have a statistically significant affect.  What will matter?  If he can get people the puck to score at a high percentage or put a lot more pucks on net, unblocked.  We know he is not an assist guy, so I think it can be deduced that he will not likely raise the shooting percentage for others (give them good chances).  Ribeiro on the other hand is a distributor based on his higher assist numbers throughout his career.

With the regression, as it is, I think my argument stands….the Washington Capitals roster is worse minus Ribeiro, plus Grabovski.  The boys still have to play this out on the ice….

All files and R script are available upon request.


Regression results with lockout year.

Estimate              Std. Error             t value Pr(>|t|)   

(Intercept)          -9.013e+01          2.739e+01           -3.290    0.00123 **

SF                           7.776e-02            7.542e-03            10.310  < 2e-16 ***

SA                           9.466e-03            7.945e-03            1.191     0.23526

FF                           -4.715e-03           8.897e-03            -0.530    0.59690

FA                           -3.381e-03           7.771e-03            -0.435    0.66408

CF                           2.919e-03            4.285e-03            0.681     0.49663

CA                          -7.803e-04           3.323e-03            -0.235    0.81466

Sh.                         1.614e+01           3.087e-01            52.269  < 2e-16 ***

Sv.                          -3.527e-01           2.997e-01            -1.177    0.24093

OZFO.                   7.607e-02            2.656e-01            0.286     0.77492

DZFO.                    -3.457e-01           2.135e-01            -1.619    0.10732

east                       1.101e+00           1.352e+00           0.815     0.41634

west                      1.385e+00           1.438e+00           0.964     0.33663

yr13                       6.808e-01            2.779e+00           0.245     0.80681

yr12                       1.468e-01            1.141e+00           0.129     0.89776

yr11                       1.178e-02            1.138e+00           0.010     0.99176

yr10                       4.663e-01            9.894e-01            0.471     0.63808

yr9                          1.853e-02            8.647e-01            0.021     0.98293

yr8                          NA                          NA                          NA          NA

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Regression results without lockout season.

Estimate              Std. Error             t value Pr(>|t|)

(Intercept)          -1.444e+02          1.295e+01           -11.152   <2e-16 ***

SF                           8.331e-02            3.073e-03            27.110   <2e-16 ***

SA                          -2.951e-03           3.277e-03            -0.900   0.3696

FF                          -6.418e-03           3.604e-03            -1.781   0.0773 .

FA                           3.601e-03            3.134e-03            1.149   0.2525

CF                           2.929e-03            1.721e-03            1.702   0.0911 .

CA                          -1.713e-03           1.354e-03            -1.266   0.2078

Sh.                          1.826e+01           1.418e-01            128.81   <2e-16 ***

Sv.                          6.188e-02            1.359e-01            0.455     0.6497

OZFO.                   -1.117e-01           1.214e-01            -0.920   0.3592

DZFO.                    -6.648e-02           9.268e-02            -0.717   0.4744

east                       -2.084e-01           5.782e-01            -0.360   0.7191

west                      -2.127e-01           6.114e-01            -0.348   0.7285

yr13                       NA                          NA                          NA          NA

yr12                       5.828e-01            4.596e-01            1.268  0.2071

yr11                       4.903e-01            4.583e-01            1.070  0.2866

yr10                       5.869e-01            3.926e-01            1.495  0.1373

yr9                          1.635e-01            3.373e-01            0.485  0.6286

yr8                          NA                          NA                          NA       NA

Semifinal #IIHF Worlds predictions. How does Sweden’s win over Canada shake things up? #Bracketology #MoneyPuck

16 May

From my Bracketology blog post (here), I went three for four on the day.  I picked the first three matches, but missed the Sweden win over Canada.  The Sedin-Sedin-Danielsson line killed it (minus that bad shootout attempt by H. Sedin–yikes).  Patting myself on the back, I missed the prediction on the last game of the day, going all the way to a shootout, in a sudden death shoutout situation.  Moreover, besides the 50/50 split on the US-Russia match, this was the closest statistical matchup–see the previous odds and probabilities here (without any historical adjustments, injuries, etc.)  Now, I couldn’t guess the Canada-Sweden match any better than I could predict that 8-3 blowout of USA over Russia, but at least I wasn’t way off.

Here is my changed bracket, with Sweden in, but I will still take Finland in that game.  If there is a game this year I would pay to be at, it would be Sweden vs. Finland from Stockholm.  It should be a battle of goaltenders, but hopefully a low scoring affair doesn’t mean a lack of offensive action.

Screen shot 2013-05-17 at 12.00.36 AM

My gold medal match prediction stays the same, but will Sweden beat Switzerland?  I am going to guess Switzerland grabs the bronze medal now.  Sweden fought hard to come back against Canada, but have one main line and strong goalkeeping.  I am not sure if that will work against the Swiss–it didn’t work the first time they played.

So, here are the odds for the semifinal round and some comments.  Keep in mind, these are neutral odds based only on math formulations, not calculating in profits as a casino or bookie would.

Likelihood Moneyline      (US) Decimial Odds (EU)
Finland 61.02% -156 1.64
Sweden 38.98% +150 2.50
Swiss 88.45% -733 1.14
US 11.55% +14 8.33

The early odds from bet365.com has Finland as the underdog.  This must have to do with Sweden having home-ice advantage.  I would bet on Finland for sure in this match.  It is not a big return (listed at +135), but it is the much better bet.  Sweden is most likely to lose based on the Log5 method and the moneyline reads -167 for them.  The bookies basically have swapped my neutral odds above.

As I assumed, after the US blowout of Russia, the oddsmakers at bet365.com are dismissing Switzerland’s undefeated run.  I too dismiss Switzerland over the US even with the odds in their favor from my guess, but my guess is against the grain of the analysis.  Maybe not as much as the +14 moneyline on the US (the game should be closer than US-Russia), but Switzerland should not be dismissed.  Switzerland’s and Finland’s chances of winning have decreased even though they moved on, but the Swiss should be really favored to win.

The returns are bad on this game.  +105 on the Swiss and -133 for the US isn’t worth wasting your money on.  Gambling tip from a non-gambler: bet Finland…I am 75% right so far and the stats give Finland a 3 in 5 chance to win.  Good luck to all the teams!

#IIHFWorlds Probability of wins for quarterfinals #MoneyPuck

16 May

I thought I’d put a little math twist on tomorrow’s match-ups and calculate the probability of each team winning.  From here, hopefully I can create some odds.  I will compare them with what you could bet against online after the analysis.

First things first, using the “Log5” method for calculating a team winning or losing (credit to Bill James in Baseball Abstract), this is what you do:

Win probability = (A – A * B) / (A + B – 2 * A * B) — where A represent Team A’s winning percentage and B represents Team B’s winning percentage.

For tomorrow, without running a regression and seeing if prior games, past year’s seeding, strength of schedule, luck, injuries, etc., make a difference, this is what we have:

Finland 77.88%
Russia 50.00%
US 50.00%
Slovakia 22.12%
Swiss 94.78%
Canada 70.59%
Sweden 29.41%
Czech 5.22%

Basically you have the percent chance each team will win their game tomorrow.  The next step is to convert these percentages into odds and then I will convert these into a moneyline.

Finland -355
Russia +/-100
US +/-100
Slovakia +355
Swiss -1900
Canada -245
Sweden +245
Czech +1900

For the explanation of the plus/minus on the moneyline, you can follow the link here:

Moneyline odds are usually considered “American” style odds, so here are the “European” style decimal odds:

Finland 1.28
Russia 2.00
US 2.00
Slovakia 4.55
Swiss 1.05
Canada 1.41
Sweden 3.45
Czech 20.00

Now, these wouldn’t guarantee a profit, because I would need to estimate the betting spread for each team and pass that over my profit margin, which is 8% customarily if I was a bookie.  What makes this fun is one can see how betting agencies set their odds differently from these “even odds” in order to make a profit.

I took a quick look at the lines over at bet365.com, where the internet says they have the lowest profit margins, meaning they should be the closest to my calculations.  It appears they take performance from past World Championships into play.  I’m not sure how much sense this makes when we see a Switzerland like this year  Maybe with professional club teams, but not here.  That is why a regression analysis would be important to see which things play the biggest role in winning or losing in the playoff round of a World Championship or other country-based format.

Nevertheless, the US is the biggest underdog (based on past matches against Russia).  Nevermind that they had the same record in the tournament and played a close match.  US is +300/4.00 and Russia is -400/1.25.  Seems a little ridiculous to me….but maybe this is where they clean up!

Switzerland is also an underdog against Czech Republic, when the Swiss have clearly been the better team.  They are currently listed at +180/2.80.  My -1900/+1900 clearly needed to be adjusted, but to make the Swiss the underdog seems a little crazy too.

I am dead on with my Canada and Finland odds, so it appears they have raised the probability of Sweden and Slovakia winning in order to meet their profit margin/lower potential payouts.  This was also likely adjusted because of what I believe is their faulty outlook on past games.

Ok–let’s see how this ends up!  Games start in nine hours!

#Moneypuck Wednesday. How bookmakers make money using probabilities.

21 Nov

The blog post will be a little slow over the next few weeks because of final exams.  My plans to pursue some research to determine winning or losing teams will have to hold off until after 13 December.

However, I was thinking that I could still do some reading and sharing in the meantime.  Thinking of probabilities, I wondered how casinos determined their odds (probabilities of a team winning) and how they made money off of it.  So, I went to everyone’s favorite research site…Wikipedia.

My thought was that if they enticed people more with giving the team most expected to lose really good odds, then people would feel like taking a risk would be more worthwhile because of the payoff to loss ratio (risk taking is a whole subfield on its own in economics).  But, the bookie/casino would lose a ton of money if the underdog won and the payoff odds were too disproportionate.  I suppose there is an equilibrium between risk taking and odds making.

Anyway, here are the basics on odds….it is the probability a team will win based on certain situations:

In considering a soccer match (the event) that can be either a ‘home win’, ‘draw’ or ‘away win’ (the outcomes) then the following odds might be encountered to represent the true chance of each of the three outcomes:

Home: EvensDraw: 2-1

Away: 5-1

These odds can be represented as relative probabilities (or percentages by multiplying by 100) as follows:

Evens (or 1-1) corresponds to a relative probability of 12 (50%)
2-1 corresponds to a relative probability of 13 (3313%)
5-1 corresponds to a relative probability of 16 (1623%)

By adding the percentages together a total ‘book’ of 100% is achieved (representing a fair book). The bookmaker, in his wish to avail himself of a profit, will invariably reduce these odds. Consider the simplest model of reducing, which uses a proportional decreasing of odds.

The not-so-odd fact is that most oddsmakers do not work with a fair book, but they work with the concept of an ‘overround’.  Check out this example:

Home: 4-5
Draw: 9-5
Away: 4-1
4-5 corresponds to a relative probability of 59 (5559%)
9-5 corresponds to a relative probability of 514 (3557%)
4-1 corresponds to a relative probability of 15 (20%)

By adding these percentages together a ‘book’ of 1111763%, or approximately 111.27%, is achieved.

The amount by which the actual ‘book’ exceeds 100% is known as the ‘overround’:  it represents the bookmaker’s potential profit if he is fortunate enough to accept bets in the exact proportions required. Thus, in an “ideal” situation, if the bookmaker accepts £111.27 in bets at his own quoted odds in the correct proportion, he will pay out only £100 (including returned stakes) no matter what the actual outcome of the football match. Examining how he potentially achieves this:

A stake of £55.56 @ 4-5 returns £100.00 (rounded down to nearest penny) for a home win.
A stake of £35.71 @ 9-5 returns £ 99.98 (rounded down to nearest penny) for a drawn match
A stake of £20.00 @ 4-1 returns £100.00 (exactly) for an away win

Total stakes received — £111.27 and a maximum payout of £100 irrespective of the result. This £11.27 profit represents a 10.1% profit on turnover (11.27 × 100/111.27).

In reality, people use models of reducing more complicated than the model of “ideal” situation.

Sneaky!  The books are cooked!

Remember that when you are betting on games that you are not looking alone on the best probabilities, risk taking behavior and good payoffs.  Your bet is part of a larger over round scheme put together by smart math folks that are going to make money for the casino…and most likely make you lose yours.  Not to mention, the profits are made off by the combination of games and odds in a certain period of time.  That means there is the crazy combination of cooking the books for a whole set of games in all sports where profitability is maximized overall for the entire set of games.

On that note, anyone want to put a friendly wager on an over/under on the NHL agreeing to terms to end the lock out by the end of the year?

Wednesday #MoneyPuck post. Predicting winners and losers.

14 Nov

Last week in my MoneyPuck post, I discussed trying to figure out how many goals were a goalie’s fault.  I determined basically it was a function of offensive prowess, opposing defenses, goalie skill, special teams (power play/penalty kill) and a good day/bad day shock.

Looks like someone generally agrees.  If you’re a stats/math person, give this a read.  Basically, they come up with a way predict scoring for each team based on offense, defense, whether a team is home or away and the power play.  Unfortunately, they state there are too many parameters if you try to match offense versus defense on a team by team basis.  It would be even more difficult if they went line by line for each team.

What you end up with out of this paper is a good way to predict over/under if you are a gambler.  Not taking anything away from this article, but if you are a practitioner, you should be able to predict how many goals your team will score in various situations.  This will allow coaches to match up their lines better against the opposition, as well as give General Manager’s a better idea of good personnel moves.

This takes me back to my post in predicting the good day/bad day scenario and splitting scoring credit/blame.

I did this briefly for the three games at the Olympic qualifiers without breaking down lines.  Based on the three games, using a weighted probability, and using that to predict a rematch against the finalists, I came up with:

Hungary scores an average of 8 goals a game and has a good game 33% of the time and a bad game 67% of the time.  This is odd, but it is because there are only three games…this would even out over a season.

The Netherlands average 7.67 goals a game and have a good game 67% of the time…a bad day 33%.

Based on this, the Netherlands would likely score 9 goals (between 8 and 9) and Hungary would score 8.

However, defensively….the Netherlands would allow 3 goals (between 3 and 4) and Hungary would also allow 3, but likely between 2 and 3.

So, both of their defenses will likely cause their offenses to have a bad day.  Hungary is more likely to have a bad day and scores between 5 and 6 goals…the Netherlands on their bad day….scored 6.

Without further analysis, this scenario works out to meet the score for the teams.  But, a full season needs to be looked at…many more games.  It then needs to be figured out what defenses cause a offensive unit to have a bad day and how much lower they would score.  I think this might be the way to go though.  More to come in future MoneyPuck posts.


Final note, and off topic (slightly)….I have been writing about hockey a lot…then I wrote a post about basketball yesterday and got just a little love.  There seems to be a low correlation between hockey love and basketball love…just saying….

Interview question from an NBA analyst

13 Nov

I have been trying hard to get an internship with an analyst in the NBA or NHL.  Really, basketball or hockey would be great.  I contacted someone through a contact on LinkedIn about an open internship.

I was pretty lucky….I got a response!  I haven’t had much success with cold calling.  However, analysts seem to be very open to discussing their jobs and getting others involved.  A very gracious group of people!

Anyway, the person I spoke with posed this question to see my train of thought: if the 2-point field goal percentage of a team goes from 47% to 48%. how many more games would they win?

My short answer without all of the wonky details….a team would generally get one more possession a game and potentially make an additional defensive stop.  Average out over a season, it would be the potential of another 2.5 points per game or so.  After looking through this team’s last season record, the team could have won one or two more games.  Not for this team in particular, but generally, this could be the difference between the playoffs or golf season; an away five seed or a home four seed.

When looking at a team, I suppose you have to ask if they need a tweak or an overhaul.  One more basket a game (when there are 100 shots or less, which is always) would move a team’s field goal percentage a point or more.  If a tweak is needed, getting two more points a game could be everything over the course of a season.  If an overhaul is needed, you’ll still get your two points, but at what price?