Tag Archives: sports statistician

KHL Statistical Power Rankings Explanation

10 Oct

I developed a statistics-based power ranking that will be a weekly feature at EuroHockey.com.  The idea was to come up with a system similar to the BCS ranking for (American) College Football (but less complicated).  Here is the formula and then a part-by-part explanation.

Formula by Team

∑(goal differential per match x opponent points) = RAW

I guess I could write that more formally, but basically here is how it goes.  For each game, I determine the goal differential.  So, If a game is 3-2, then there is a goal differential of 1.  The winning team will get a 1 in the cell for that game.  The losing team will get a 0.

Next, the point differential is multiplied by the number of points the team has in the standings.  Say in the scenario above that each team has 15 points in the standings.  Then the one goal differential is multiplied by 15 and the winning team receives 15 points for that game.  The losing team has 15 multiplied by zero, so teams get no points for the loss.  The totals for all games played are added together for the RAW score.

This means a couple of things.  First, the losing team is not penalized for losing.  Second, the winning team does receive an incentive by beating a team by a larger point margin.  However, just running up the score and not playing defense will not help a team in these rankings, because it is not goals scored, but goal differential.

Overtime and shutout wins are considered indirectly by multiplying these totals by the point standings.  Beating an opponent by the biggest differential who has the highest point standings will give a team the most points for a game.  Beating a lesser opponent is less significant.

The RAW score is adjusted by dividing the number of games played (GP), which gives the “Points Ranking”.

I hope this makes sense and you enjoy the KHL Statistical Power Rankings.  The first edition is here.

More on Grabovski – Do advanced stats say anything about a team scoring goals?

28 Aug

An initial disclaimer:

This piece is for discussion.  Statistical operations can be tricky and there can be a number of ways to do things.  I am not claiming to be right or wrong on anything, yet.  If you have some advice, please provide comment.

Round 2

So, after my article on the Caps picking up Grabovski and me not thinking it was as big of a deal as others were making it, the response was brutal.  I take some credit for that by putting out an unpolished piece.  In the end, I stand by my argument that the idea Grabovski would go from a career 45-50 point scorer to a 60-70 point guy was hyperbole.

Some people discussed how his Corsi and Fenwick ratings, and that Washington had a lot more offensive zone faceoffs than Toronto (which should lead to more chances), would make him an improvement over Ribeiro.  I basically argued that despite the improved advanced stats, it seemed crazy that any one person’s numbers would jump that high; thus, the Caps roster is at a net loss without Ribeiro, add Grabo.

To that end, I wanted to examine this further.  Here is my idea:  the better Corsi, Fenwick and offensive zone faceoffs a team has, under the “Grabovski hypothesis”, should lead to more team goals (he manes his teammates better argument).  If this is true, we should be able to perform a linear regression and see how a variety of statistics effect the number of goals a team scores (goals for).  In other words, I wanted to see what happens when we regress a team’s “goals for” for a season (y-variable) on a set of variables, including those mentioned above (X-set).

Thus, I went to stats.hockeyanalysis.com and grabbed team stats for all teams from the 2007-2008 seasons through the last season.  I added all of HA’s data (see legend below) and added some dummy variable, which is common when analyzing panel data.

Legend

TOI = Time on ice
GF = Goals For
GA = Goals Against
GF60 = Goals For per 60 minutes of ice time
GA60 = Goals Against per 60 minutes of ice time
GF% = Goals For percentage = 100* GF / (GF + GA)
SF = Shots For
SA = Shots Against
SF60 = Shots For per 60 minutes of ice time
SA60 = Shots Against per 60 minutes of ice time
SF% = Shots For percentage = 100* SF / (SF + SA)
FF = Fenwick For
FA = Fenwick Against
CF = Corsi For
CA = Corsi Against
Sh% = Shooting Percentage
Sv% = Save Percentage
OZFO% = Percentage of face offs that took place in the offensive zone
DZFO% = Percentage of face offs that took place in the defensive zone

Items in red are in the data table, but were not used in the regression so there weren’t correlation issues between the x-variables.

Dummy Variables

east – Eastern Conference (0=No, 1=yes)

west – Western Conference (0=No, 1=yes)

yr** – year dummy for the year the data was taken (0 = not year **, 1 = year**) – one dummy variable for each of the six years

Results

Looking from the 2007-2008 season through the 2012-2013 season, the regression results only showed statistically significant results (at the 0.05 level) for shooting percentage and shots for (see “regressions results with lockout year” below).

I thought maybe the lockout-shortened season last year might have messed with things a bit, so I removed it and ran it again.  The only thing statistically significant again is shooting percentage and shots for.  Fenwick-for and Corsi-for are statistically significant at the 0.1 level, which is usually not accepted.  Let’s say we do accept the stats at this level.  A team would gain 1.7 goals per season for every additional 1,000 Corsi-for, or 1,000 shots directed at the net, or an one goal per season for every 333 additional Fenwick-for or 333 shots directed at the net (excluding blocked shots).

Grabovski

If I did this correctly, then only those old-fashioned statistics of shots on goal and shooting percentage matter how many times a team scores.  Offensive zone faceoff percentage does not matter.  Corsi and Fenwick are not statistically significant.  Even so, Grabovski and his improvement on other players would have to add 1,000 shots directed at the net to gain an additional 1.7 goals per season (or 333 shots not including blocks).

This does not say whether or not Grabovski will be better or worse than Ribeiro.  But, as it stands, Grabovski’s addition to the team based on the advanced stats do not have a statistically significant affect.  What will matter?  If he can get people the puck to score at a high percentage or put a lot more pucks on net, unblocked.  We know he is not an assist guy, so I think it can be deduced that he will not likely raise the shooting percentage for others (give them good chances).  Ribeiro on the other hand is a distributor based on his higher assist numbers throughout his career.

With the regression, as it is, I think my argument stands….the Washington Capitals roster is worse minus Ribeiro, plus Grabovski.  The boys still have to play this out on the ice….

All files and R script are available upon request.

Details

Regression results with lockout year.

Estimate              Std. Error             t value Pr(>|t|)   

(Intercept)          -9.013e+01          2.739e+01           -3.290    0.00123 **

SF                           7.776e-02            7.542e-03            10.310  < 2e-16 ***

SA                           9.466e-03            7.945e-03            1.191     0.23526

FF                           -4.715e-03           8.897e-03            -0.530    0.59690

FA                           -3.381e-03           7.771e-03            -0.435    0.66408

CF                           2.919e-03            4.285e-03            0.681     0.49663

CA                          -7.803e-04           3.323e-03            -0.235    0.81466

Sh.                         1.614e+01           3.087e-01            52.269  < 2e-16 ***

Sv.                          -3.527e-01           2.997e-01            -1.177    0.24093

OZFO.                   7.607e-02            2.656e-01            0.286     0.77492

DZFO.                    -3.457e-01           2.135e-01            -1.619    0.10732

east                       1.101e+00           1.352e+00           0.815     0.41634

west                      1.385e+00           1.438e+00           0.964     0.33663

yr13                       6.808e-01            2.779e+00           0.245     0.80681

yr12                       1.468e-01            1.141e+00           0.129     0.89776

yr11                       1.178e-02            1.138e+00           0.010     0.99176

yr10                       4.663e-01            9.894e-01            0.471     0.63808

yr9                          1.853e-02            8.647e-01            0.021     0.98293

yr8                          NA                          NA                          NA          NA

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Regression results without lockout season.

Estimate              Std. Error             t value Pr(>|t|)

(Intercept)          -1.444e+02          1.295e+01           -11.152   <2e-16 ***

SF                           8.331e-02            3.073e-03            27.110   <2e-16 ***

SA                          -2.951e-03           3.277e-03            -0.900   0.3696

FF                          -6.418e-03           3.604e-03            -1.781   0.0773 .

FA                           3.601e-03            3.134e-03            1.149   0.2525

CF                           2.929e-03            1.721e-03            1.702   0.0911 .

CA                          -1.713e-03           1.354e-03            -1.266   0.2078

Sh.                          1.826e+01           1.418e-01            128.81   <2e-16 ***

Sv.                          6.188e-02            1.359e-01            0.455     0.6497

OZFO.                   -1.117e-01           1.214e-01            -0.920   0.3592

DZFO.                    -6.648e-02           9.268e-02            -0.717   0.4744

east                       -2.084e-01           5.782e-01            -0.360   0.7191

west                      -2.127e-01           6.114e-01            -0.348   0.7285

yr13                       NA                          NA                          NA          NA

yr12                       5.828e-01            4.596e-01            1.268  0.2071

yr11                       4.903e-01            4.583e-01            1.070  0.2866

yr10                       5.869e-01            3.926e-01            1.495  0.1373

yr9                          1.635e-01            3.373e-01            0.485  0.6286

yr8                          NA                          NA                          NA       NA

Semifinal #IIHF Worlds predictions. How does Sweden’s win over Canada shake things up? #Bracketology #MoneyPuck

16 May

From my Bracketology blog post (here), I went three for four on the day.  I picked the first three matches, but missed the Sweden win over Canada.  The Sedin-Sedin-Danielsson line killed it (minus that bad shootout attempt by H. Sedin–yikes).  Patting myself on the back, I missed the prediction on the last game of the day, going all the way to a shootout, in a sudden death shoutout situation.  Moreover, besides the 50/50 split on the US-Russia match, this was the closest statistical matchup–see the previous odds and probabilities here (without any historical adjustments, injuries, etc.)  Now, I couldn’t guess the Canada-Sweden match any better than I could predict that 8-3 blowout of USA over Russia, but at least I wasn’t way off.

Here is my changed bracket, with Sweden in, but I will still take Finland in that game.  If there is a game this year I would pay to be at, it would be Sweden vs. Finland from Stockholm.  It should be a battle of goaltenders, but hopefully a low scoring affair doesn’t mean a lack of offensive action.

Screen shot 2013-05-17 at 12.00.36 AM

My gold medal match prediction stays the same, but will Sweden beat Switzerland?  I am going to guess Switzerland grabs the bronze medal now.  Sweden fought hard to come back against Canada, but have one main line and strong goalkeeping.  I am not sure if that will work against the Swiss–it didn’t work the first time they played.

So, here are the odds for the semifinal round and some comments.  Keep in mind, these are neutral odds based only on math formulations, not calculating in profits as a casino or bookie would.

Likelihood Moneyline      (US) Decimial Odds (EU)
Finland 61.02% -156 1.64
Sweden 38.98% +150 2.50
Swiss 88.45% -733 1.14
US 11.55% +14 8.33

The early odds from bet365.com has Finland as the underdog.  This must have to do with Sweden having home-ice advantage.  I would bet on Finland for sure in this match.  It is not a big return (listed at +135), but it is the much better bet.  Sweden is most likely to lose based on the Log5 method and the moneyline reads -167 for them.  The bookies basically have swapped my neutral odds above.

As I assumed, after the US blowout of Russia, the oddsmakers at bet365.com are dismissing Switzerland’s undefeated run.  I too dismiss Switzerland over the US even with the odds in their favor from my guess, but my guess is against the grain of the analysis.  Maybe not as much as the +14 moneyline on the US (the game should be closer than US-Russia), but Switzerland should not be dismissed.  Switzerland’s and Finland’s chances of winning have decreased even though they moved on, but the Swiss should be really favored to win.

The returns are bad on this game.  +105 on the Swiss and -133 for the US isn’t worth wasting your money on.  Gambling tip from a non-gambler: bet Finland…I am 75% right so far and the stats give Finland a 3 in 5 chance to win.  Good luck to all the teams!

#IIHFWorlds Probability of wins for quarterfinals #MoneyPuck

16 May

I thought I’d put a little math twist on tomorrow’s match-ups and calculate the probability of each team winning.  From here, hopefully I can create some odds.  I will compare them with what you could bet against online after the analysis.

First things first, using the “Log5” method for calculating a team winning or losing (credit to Bill James in Baseball Abstract), this is what you do:

Win probability = (A – A * B) / (A + B – 2 * A * B) — where A represent Team A’s winning percentage and B represents Team B’s winning percentage.

For tomorrow, without running a regression and seeing if prior games, past year’s seeding, strength of schedule, luck, injuries, etc., make a difference, this is what we have:

Finland 77.88%
Russia 50.00%
US 50.00%
Slovakia 22.12%
Swiss 94.78%
Canada 70.59%
Sweden 29.41%
Czech 5.22%

Basically you have the percent chance each team will win their game tomorrow.  The next step is to convert these percentages into odds and then I will convert these into a moneyline.

Finland -355
Russia +/-100
US +/-100
Slovakia +355
Swiss -1900
Canada -245
Sweden +245
Czech +1900

For the explanation of the plus/minus on the moneyline, you can follow the link here:

Moneyline odds are usually considered “American” style odds, so here are the “European” style decimal odds:

Finland 1.28
Russia 2.00
US 2.00
Slovakia 4.55
Swiss 1.05
Canada 1.41
Sweden 3.45
Czech 20.00

Now, these wouldn’t guarantee a profit, because I would need to estimate the betting spread for each team and pass that over my profit margin, which is 8% customarily if I was a bookie.  What makes this fun is one can see how betting agencies set their odds differently from these “even odds” in order to make a profit.

I took a quick look at the lines over at bet365.com, where the internet says they have the lowest profit margins, meaning they should be the closest to my calculations.  It appears they take performance from past World Championships into play.  I’m not sure how much sense this makes when we see a Switzerland like this year  Maybe with professional club teams, but not here.  That is why a regression analysis would be important to see which things play the biggest role in winning or losing in the playoff round of a World Championship or other country-based format.

Nevertheless, the US is the biggest underdog (based on past matches against Russia).  Nevermind that they had the same record in the tournament and played a close match.  US is +300/4.00 and Russia is -400/1.25.  Seems a little ridiculous to me….but maybe this is where they clean up!

Switzerland is also an underdog against Czech Republic, when the Swiss have clearly been the better team.  They are currently listed at +180/2.80.  My -1900/+1900 clearly needed to be adjusted, but to make the Swiss the underdog seems a little crazy too.

I am dead on with my Canada and Finland odds, so it appears they have raised the probability of Sweden and Slovakia winning in order to meet their profit margin/lower potential payouts.  This was also likely adjusted because of what I believe is their faulty outlook on past games.

Ok–let’s see how this ends up!  Games start in nine hours!

#2013WJC (delayed) #Moneypuck update. Results of pulling your goalie…does it matter?

16 Feb

A quick look at the results of pulling your keeper in the World Junior Championships.  There are some potential problems with this analysis.  First, if you switch goalies because you either leading by a lot or losing by a lot there may not be a reason for the teams to play as hard.  Bench players may also get more time, meaning less skill on the ice, possibly less scoring and defense.  Nevertheless, it’ll be interesting to look at the results.

The first keeper pulled was in game 2: Switzerland vs. Latvia.  The Swiss were up 5 to 2 after the second period and Latvia switched in Punnenovs for Merzlikins.  Switzerland’s offensive performance declined in the 3rd period, putting only 9 shots on goal in the 3rd (17 in the 1st and 13 in the 2nd).  However, the Swiss outscored the Latvian side 2-0 in the final period.  Latvia actually played worse in the 3rd period with the new keeper.

Punnenovs got the start in the final two games and finished with a 5.02 GAA.  Merzlikins finished with a 6.23 GAA.

The U.S. switched goalies after going up big against Germany in their 8-0 win.  Though it is hard to say definitively it had an effect, Gibson lost to a much better Russian side in their next match.

Germany moved away from Subban after there 9-3 loss to Canada.  Cupper started the final three games and lost 8-0, 7-0, and 2-1.

In both the U.S. vs. Russia and U.S. vs. Canada losses, Gibson was pulled in the final minutes to give the Americans an extra skater.  Neither instance led to the equalizer.  It would be interesting to see if more offense was generated when Gibson was out of the net, even though there were no goals.

Finland scored in five seconds after pulling Korpisalo in their 5-4 shootout win over Switzerland.  This goal was made by the extra skater, Markus Granlund, but during a faceoff.  Scoring on a possession in the offensive zone within five seconds makes it difficult to credit the goal to having the extra skater.  Nevertheless, that was the case.

So, it appears that in a tournament setting, that pulling your goalie when you are up to give them a rest in later games could affect them negatively in later games.  Also, generally speaking, pulling your goalie more often than not does not lead to that equalizer goal.  The wisdom is that the man advantage gives a team a better opportunity to score, but the extra goal rarely comes to fruition.

 

***This article was originally drafted in January.  Since there was an interesting goalie pulling situation in the under 20 tournament for the Hungarian team.  Mark Plekszan started in goal the first game and was chased out.  Hungary lost that first game.  He was replaced in the following game, but got the start again later.  He was again chased from the net; however, he was pulled early enough in the first period that Hungary was able to come back and win that game.  The mixed result here is that pulling him in the tournament probably didn’t help his confidence.   Yet, making an early decision in a tournament to pull your keeper could be beneficial.  Though, it seems if a team decides to make that switch, then they should stick with their decision for the rest of the tournament.  This was played out in the 2013 WJC and some of the Olympic prequalifying tournaments, as the teams that switched goalies the least had the most success.

Making the decision to pull your goalkeeper. #2013WJC #Moneypuck.

27 Dec

As I was tweeting about the USA vs. Germany World Junior Championship games earlier, I incorrectly tweeted that John Gibson was the netminder for the complete game shutout today.  After going over the stat sheets, Phil Housley made the decision to put in Jon Gilles for the final period of play.  Gibson played well, saving 19 of 19 shots in 40 minutes of action.  However, how will missing the final 20 minutes of play affect Gibson in future games?

Being an NFL fan, when teams make the playoffs with regular season games remaining, teams often don’t play some of their starters in order to let the player rest or to prevent injury.  I can think of two instances, Manning with the Colts and Brady with the Patriots, where both were rested after strong regular seasons and they came up short in the playoffs.  Football is played once a week though, maybe twice, so a layoff could lead to three weeks or more without being on the field.  Here, we are talking about missing 1/3 of a game and then playing again the next day.  Maybe the turnaround will keep players fresh…maybe not….

I thought about this issue after watching the Olympic prequalification in Budapest this year.  Team Hungary and Team Holland massacred Team Lithuania and Team Croatia in their first two games.  Hungary pulled their starting keeper because of big leads in both of the games; the Netherlands pulled their keeper part way through the Croatia matchup.  The result: a 7-6 finale between the two teams where each netminder was torched by the other teams.

So, I realize that there isn’t a lot of statistical evidence that the reason for the higher scoring output was because of goalies being rested.  Scores of international games tend to be a little higher then we are used to in professional leagues.  This could be because of a lack of defense, more open play as we see during all-star games – there are many reasons why the keepers let in more goals than usual.

First though, let’s take a step back and look at why starting keepers are pulled.  In order of occurrence (guessing), I would say the following are the reasons:

  1. One team has a slight lead over the other and there is less than 2 minutes left.  Keeper pulled for the extra skater;
  2. Injury replacement;
  3. Bad play…hoping the change creates a spark or saves the keeper from further embarrassment/psychological issues (gun shy);
  4. Lead is so high, that starter is pulled to provide rest and/or prevent injury.

This list could be wrong, but I would say reason 4 is definitely the last reason a keeper is pulled.  I think this odd reason that could lead to some problems.  Keepers may not be as sharp the next game because they didn’t play an entire 60 minutes and probably weren’t tested much during the limited minutes they played.  Goalies tend to be pretty superstitious…being pulled for an uncommon reason could mess up their mojo.

Now let’s predict the future instead of playing “I told you so down the road”.  Only two goalies have been pulled so far in the tournament.  Latvia pulled their keeper against Switzerland today after falling behind 5 to 2 after two periods.  This was after a loss the previous day with the same keeper playing a full game.  Saturday’s starter and subsequent play against Sweden Saturday could be telling.  The guess should be that if the same keeper starts, he would play no worse or else pulling them didn’t do any good.  If the a different keeper starts, then the guess would be he is outplaying his counterpart.

The U.S. pulled their keeper after going up 6-0 over Germany.  Gibson’s first major test will be tomorrow.  Russia was well contested by Slovakia, pulling off an overtime victory with 10 seconds remaining.  Valsilevski saved 32 of 34 shots in nearly 65 minutes of play.  Gibson for the U.S. saved 19 shots in 40 minutes of play.  Shawn Reznik from TheHockeyWriters.com calls Valsilevski the tourney’s “Best Goalie“.  The U.S. has the groups only shutout in two days.

Tomorrow will be interesting …with a strong offensive and defensive outing by the U.S. and a understated showing by the Russians, if Gibson let’s in a few soft goals, it will make us wonder a bit about Housley’s decision to not let Gibson see the final 7 shots.

 

Image

#Moneypuck Wednesday. How bookmakers make money using probabilities.

21 Nov

The blog post will be a little slow over the next few weeks because of final exams.  My plans to pursue some research to determine winning or losing teams will have to hold off until after 13 December.

However, I was thinking that I could still do some reading and sharing in the meantime.  Thinking of probabilities, I wondered how casinos determined their odds (probabilities of a team winning) and how they made money off of it.  So, I went to everyone’s favorite research site…Wikipedia.

My thought was that if they enticed people more with giving the team most expected to lose really good odds, then people would feel like taking a risk would be more worthwhile because of the payoff to loss ratio (risk taking is a whole subfield on its own in economics).  But, the bookie/casino would lose a ton of money if the underdog won and the payoff odds were too disproportionate.  I suppose there is an equilibrium between risk taking and odds making.

Anyway, here are the basics on odds….it is the probability a team will win based on certain situations:

In considering a soccer match (the event) that can be either a ‘home win’, ‘draw’ or ‘away win’ (the outcomes) then the following odds might be encountered to represent the true chance of each of the three outcomes:

Home: EvensDraw: 2-1

Away: 5-1

These odds can be represented as relative probabilities (or percentages by multiplying by 100) as follows:

Evens (or 1-1) corresponds to a relative probability of 12 (50%)
2-1 corresponds to a relative probability of 13 (3313%)
5-1 corresponds to a relative probability of 16 (1623%)

By adding the percentages together a total ‘book’ of 100% is achieved (representing a fair book). The bookmaker, in his wish to avail himself of a profit, will invariably reduce these odds. Consider the simplest model of reducing, which uses a proportional decreasing of odds.

The not-so-odd fact is that most oddsmakers do not work with a fair book, but they work with the concept of an ‘overround’.  Check out this example:

Home: 4-5
Draw: 9-5
Away: 4-1
4-5 corresponds to a relative probability of 59 (5559%)
9-5 corresponds to a relative probability of 514 (3557%)
4-1 corresponds to a relative probability of 15 (20%)

By adding these percentages together a ‘book’ of 1111763%, or approximately 111.27%, is achieved.

The amount by which the actual ‘book’ exceeds 100% is known as the ‘overround’:  it represents the bookmaker’s potential profit if he is fortunate enough to accept bets in the exact proportions required. Thus, in an “ideal” situation, if the bookmaker accepts £111.27 in bets at his own quoted odds in the correct proportion, he will pay out only £100 (including returned stakes) no matter what the actual outcome of the football match. Examining how he potentially achieves this:

A stake of £55.56 @ 4-5 returns £100.00 (rounded down to nearest penny) for a home win.
A stake of £35.71 @ 9-5 returns £ 99.98 (rounded down to nearest penny) for a drawn match
A stake of £20.00 @ 4-1 returns £100.00 (exactly) for an away win

Total stakes received — £111.27 and a maximum payout of £100 irrespective of the result. This £11.27 profit represents a 10.1% profit on turnover (11.27 × 100/111.27).

In reality, people use models of reducing more complicated than the model of “ideal” situation.

Sneaky!  The books are cooked!

Remember that when you are betting on games that you are not looking alone on the best probabilities, risk taking behavior and good payoffs.  Your bet is part of a larger over round scheme put together by smart math folks that are going to make money for the casino…and most likely make you lose yours.  Not to mention, the profits are made off by the combination of games and odds in a certain period of time.  That means there is the crazy combination of cooking the books for a whole set of games in all sports where profitability is maximized overall for the entire set of games.

On that note, anyone want to put a friendly wager on an over/under on the NHL agreeing to terms to end the lock out by the end of the year?