By: Patrick W. Zimmerman
Soccer is a maddening sport to predict. Modeling soccer is easy because of the 88 billion games per year and hard because of the poor resolution in each game. Goals per game is low, so significant differences in team quality still sometimes don’t show up in any one game’s scoreboard. This is in extreme contrast to, say, basketball, where the 75-125 scoring events per game tend to minimize luck as a determining element.
That said, it’s amazing and irresistable. The spread and evolution of the world’s game has created a diversity of tactics, players, and competitions at the highest level that no other sport can match. Unlike basketball, hockey, baseball, or (American) football, it has resisted the consolidation of all the best players in one single league (soccer fans usually speak of a Big Five leagues – England, Spain, Germany, Italy, and France.). The effect of that breeds that most precious of sports situations: unfamiliarity.
In such an environment, there are fundamentally different tactical formations and player strategies clashing, without teams having well-drilled and rote counters for them. Even some of the more popular formations (4-4-2, 4-3-3, 4-2-3-1) can be played very differently. Spain’s midfield-focused 4-3-3 during their run of greatness at times seemed to have no forwards at all, filled with patient, methodical, tiny ball-hogs passing the ball into the net. The Chilean 3-3-1-3 / 3-4-1-2 under Bielsa & Sampaoli was a blitz of motion and energy and pressing. Part of the success of many great teams is the novelty of their approach (at least initially).
So, tactical variance and the relatively high influence of luck, both make the World Cup hard to forecast and incredibly compelling entertainment.
Game on!
The question
How is our model, incorporating historical World Cup results, relative player pool size, and recent performance, doing against other prediction systems?
The short-short version
It’s doing ok relative to other models! It’s doing eeeeehhhhhh relative to reality.
The models
We’ll compare our model’s performance to:
- The experts at The FiveThirtyEight.
- The collective wisdom of the betting public using OddsPortal’s meta-odds combining 14 casino books and online oddsmakers (we’ll use the final line before kickoff).
- And, for humor and a baseline, FIFA’s rankings (we’ll consider any game between teams with ≤150pts separating them to be predicting a draw).
We’ll measure along two scales. First, a points system that assigns 1 point for every correct result, scaling up with each round (1 for group stage games, 2 for round of 16, 4 for quarterfinals, 8 for semifinals and 3rd place, 16 for final). To mirror soccer’s 3-points-for-a-win system, if the prediction is off by only a little bit (i.e, the model called a win for team A but the game ended in a tie), 1/3 of the point total will be awarded.
The second measure will be a simple system, only taking into account correctly called results. Did the model get it right, taking nothing else into account?
Model comparisons after one group game for every team
Model scoreboard | ||||
---|---|---|---|---|
Model | Points | Points % | Correct results | Correct % |
Principally Uncertain | 10 | 0.625 | 9 | 0.563 |
The FiveThirtyEight | 10 | 0.625 | 9 | 0.563 |
Betting Markets | 10 | 0.625 | 9 | 0.563 |
FIFA rankings | 8 ⅓ | 0.521 | 6 | 0.375 |
So, good news, bad news for us. Good news: our model is holding up well in comparison with others and kicks the pants off of FIFA’s risible rankings. Bad news: getting just over half of the results is pretty underwhelming.
Note for future projects: soccer predictions are hard.
Game predictions and results | ||||||
---|---|---|---|---|---|---|
Stage | Game | P? | 538 | Odds | FIFA | Actual result |
Group A | Russia v. Saudi Arabia | RUS | RUS | RUS | TIE | RUS, 5-0 |
Group A | Egypt v. Uruguay | URU | URU | URU | URU | URU, 1-0 |
Group B | Portugal v. Spain | TIE | ESP | ESP | TIE | TIE, 3-3 |
Group B | Morocco v. Iran | MAR | MAR | MAR | TIE | IRN, 1-0 |
Group C | France v. Australia | FRA | FRA | FRA | FRA | FRA, 2-1 |
Group C | Peru v. Denmark | DEN | DEN | DEN | TIE | DEN, 1-0 |
Group D | Argentina v. Iceland | ARG | ARG | ARG | ARG | TIE, 1-1 |
Group D | Croatia v. Nigeria | CRO | CRO | CRO | CRO | CRO, 1-0 |
Group E | Brazil v. Switzerland | BRA | BRA | BRA | BRA | TIE, 1-1 |
Group E | Costa Rica v. Serbia | TIE | SRB | SRB | TIE | SRB, 1-0 |
Group F | Germany v. Mexico | GER | GER | GER | GER | MEX, 1-0 |
Group F | Sweden v. South Korea | SWE | SWE | SWE | SWE | SWE, 1-0 |
Group G | Belgium v. Panama | BEL | BEL | BEL | BEL | BEL, 3-0 |
Group G | Tunisia v. England | ENG | ENG | ENG | TIE | ENG, 2-1 |
Group H | Colombia v. Japan | COL | COL | COL | COL | JPN, 2-1 |
Group H | Poland v. Senegal | POL | POL | POL | POL | SEN, 2-1 |
What next?
Ok, so modeling soccer on a game-by-game basis is hard. That’s also what makes it fun!
In addition to updates on model performance after every set of games, we’ll also look at how each model is performing at different resolutions after we have enough games played. That is to say, it’s assumed that models will be imperfect. How close is each one getting overall to nailing each team’s performance as a whole?
Huh, looks like math has a way to measure that. Meet our good friend, variance (σ2).
At the end of the group stage, and then again at the end of the tournament, we’ll look at the variance in each model’s predicted points earned by each team. Rather intuitively, lowest σ2 = best model. Looking at variance across the whole field will also account for some crazy runs (South Korea 2002) as well as disappointing performances (Spain 2014). This will let us see both how accurate they were in an absolute sense as well as with respect to each other.
This cup is bonkers. We’re going to need more beer.
No Comments on "World Cup Predictions: in a bonkers first round of games, even the best models get just over 50%"