clock menu more-arrow no yes

Filed under:

The Stat Man: How might the League One table look on May 3rd?

New, comment

Don’t shoot the messenger....

Sunderland v Gillingham - Sky Bet League One Photo by Ian Horrocks/Sunderland AFC via Getty Images

Anyone attempting to look into the future of what has, so far, been a remarkably tight and turbulent League One season, must be mad. It really is too tight for anyone to predict, with any certainty, how it will end.

Many people have a subjective opinion on how things will play out. ‘That last minute equaliser ends our automatic promotion hopes’, ‘Coventry are going to run away with the league now’, ‘we only have three games at home so we’ve got no chance’, ‘I can’t see Peterborough losing again’ - all examples of subjective opinions.

I don’t have a magic formula to predict how the remainder of the season will play out, but what I do have is a ton of data that allows me to run some code to predict, entirely objectively and without any opinion, how things are likely to go.

The key word there is ‘likely’.

Let me explain why that is important. I occasionally put out a predicted end of season league table on Twitter. They are usually greeted with interest and intrigue by many, scepticism by some, and fury by a few. A couple of the comments I have received are “I’d love us to win 9 and draw 3 of our last 12 games but that’s very unlikely, also Coventry stay unbeaten but finish 3rd......nah“ and “far too many draws, your prediction is a load of shite“. A heavy dose of subjectivity there in both comments.

Sunderland v Gillingham - Sky Bet League One Photo by Ian Horrocks/Sunderland AFC via Getty Images

The best way, in fact the only way, to do such predictions is by using past data.

The best predictor of future behaviour is past behaviour.

In the absence of a time machine, past data is all I have to work with. If the data says “Coventry stay unbeaten but finish 3rd” then that is what it says. I didn’t decide that would be the case, the data decided.

The model does indeed predict “far too many draws”. But it is correct to do that. Let me explain.

Many League One sides are fairly well-balanced. If, for example, Sunderland and Portsmouth were to play each other ten times (it feels like that happened in reality last season), it is likely that about three games would end in a Sunderland victory, about three games would end in a Portsmouth win, and about three games would end in a draw. So, with this knowledge, if you were asked to give a prediction of the most likely result in a 10th game between the two sides tomorrow, what would you say? Unless you’ve not been paying attention, the answer is a draw.

There are many of these balances around League One. There isn’t that much separating the sides who come up against each other. The biggest margin of victory the model will predict tends to be two goals. An example of such a prediction was when Sunderland hosted Rochdale a few weeks ago, the model predicted a 2-0 home victory (it ended 3-0). It also predicted the Fleetwood home game would end 1-1, which it duly did. The same prediction was made of the Coventry game, but that ended in a 1-0 defeat for Sunderland.

The excessive prediction of draws is where ‘likely’ is most apparent. Many games in League One are balanced and draws are the most likely result.

So why are there not more draws in reality?

Football is unpredictable. There are many thousands of random events and interactions occurring on the field during every game. These random events are effectively impossible to predict. Here are some examples of events that will tip the balance between two teams that are otherwise well-matched:

  • The flair player curls one into the top corner from the edge of the box.
  • A misplaced back pass is intercepted and a goal results.
  • The opposition striker is tripped in the box and wins a penalty.
  • A League One referee makes one of their trademark mad decisions that changes the course of the game.

Games turn on random and unpredictable stuff like the scenarios listed above. And this is where ‘likely’ and ‘actual’ separate.

Here are the predictions for Tuesday night’s games:

They won’t be exactly right, but they give an indication of what is the most likely result in the rightmost column. I’m sure you’ll agree there is nothing particularly outrageous there.

Likely is - based on the currently available data, if the remainder of the season were played out a thousand times, on average, this is how it would end.

Now that we all understand each other, here is the current prediction.

Don’t shoot the messenger.