Analysis

On Advanced Stats in Soccer, part 2: Luckluster and an end to all regressive means

Photo: Paul Rudderow

Editors’ note: Yesterday, Adam Schorr detailed his quest to find substance in an empty sea of soccer stats. (Yes, that means read it first.) In part two of the series, he tries to find a statistical explanation of the Union’s precipitous fall at the end of the 2016 season.

The Union in 2016

The Union started out hot, going into the Copa Centenario break in first place in the Eastern Conference with 23 points from their first 14 games. At the break though, Vincent Nogueira departed from the team. They proceeded to take 17 points from their next 13 games and looked well on their way to a playoff spot. However, they managed only 2 points in their final 7 games and snuck into the playoffs. So what the hell happened? The common refrain is that Nogueira leaving irreparably damaged the team. But the stats tell a different story.

To begin with, I used the flat expected goals metric because that’s what I had the data for. As explained in Part One, the expected goals is basically just a measure of goals per shot. Teams typically average around 12.9 shots per game. With more data, we could break it down much more, but we only have top level data so that’s what we’re going with – around 12.9 shots per game, with about 11% resulting in goals.

So, here’s the Union last season:

Union Shots Union Goals Shots per game Union G/Shot Opp. Shots Opp. Goals Opp. Shots per Game Opp. G/Shot
w/ Nogs pre-Copa 110 12 13.75 .109 108 11 13.5 .102
w/o Nogs pre-Copa 66 9 11 .136 75 5 12.5 .067
Next 13 174 26 13.38 .149 170 25 13.08 .147
Last 7 75 5 10.71 .067 90 14 12.86 .156

 

Now, I would love to look at each player individually and hopefully I will be able to soon (h/t to Chris Sherman for the data, but now I need to find the time), but I started with Nogueira because he was the guy everybody pointed to. And looking at the numbers, it’s hard to argue that losing Nogueira was the impetus for the collapse. What happened? Well, with Nogueira, the Union played at a faster tempo (judging by shots and shots allowed), and were slightly above average. Without him, their offensive and defensive efficiency both skyrocketed. After the Copa, the Union were an offensive dynamo, getting off more shots than average, scoring on way more than average, but their defense collapsed, allowing opposing teams to score significantly more efficiently. Down the stretch, it all fell apart as their offense completely disappeared and the defense didn’t come back.

So, what happened? I don’t know. Sure, that’s an unsatisfying answer and I’m hoping to have a more satisfying one in the future. Maybe it was Tribbett/Yaro, Carroll/Creavalle, Bedoya, Alberg, or who knows who else. It’s frustrating that the stats aren’t out there to easily analyze this.

There is accepted evidence that closer shots are more likely to result in goals, but shot distance stats aren’t available. Were the Union allowing better shots or were opposing teams getting lucky from distance? Were the Union failing to generate good opportunities or getting unlucky from distance? Without the data, there’s simply no way to tell. Hey, what is luck, anyway?

Luck

Peter Andrews recently asked if the Union were unlucky. But well, what is luck? Take a second, actually try to define it in your head.

Good? Okay, so here’s how I define luck: Luck is an expression of occurrences that are unlikely, or at least less likely than the most likely occurrence. Good luck is when something unlikely but positive happens. Bad luck is when something unlikely but negative happens. There is no luck when the expected happens. For those who need an example:

You have to flip three coins. You only lose if all three coins land on tails. Flipping three tails would happen only 12.5% of the time (1 in 8). If you flipped three tails, that would be “unlucky”. Now let’s add to the game. You only win if all three coins land on heads, otherwise, you have to reflip until you get 3 heads or 3 tails. Now you have a 1 in 8 chance to win and a 1 in 8 chance to lose on any set of three, but you keep playing until you get one so it’s an even 50/50. Whether you consider even odds to be lucky or not is up to you.

To determine luck, you need a baseline. With coin flips, it’s fairly easily. You have two possible outcomes and each are equally likely. When it comes to sports, there’s very few situations with pure 50/50s or even perfectly known information. So how do we talk about luck?

Generally, we start with what is known then make an educated guess as to the likelihood of outcomes, then determine what would be lucky and unlucky. In soccer, a tap-in goal into an empty net is virtually assured — but sometimes there is a bad bounce and the ball hops at just the wrong time. Basically every event has some element of luck. Every time the ball heads towards a goalie, every time a referee has to make an offsides or penalty kick call, etc. Just go look at “things that went wrong” in Peter’s article.

Luck, then, is a function of knowledge – knowing that most of the time, players and referees at the highest level get it right. Luck, then, is when they get it wrong or pull off the spectacular. There is a strong element of skill in these as well, but there are many times where the ball takes a “lucky” or “unlucky” bounce. How, then, do we separate out luck from skill? Well, that takes us right back to needing better statistics. It’s not impossible to make judgments about which players or teams are better, but it’s not always easy.

And that leads us to variance.

Variance

Without variance, sports would be boring. If it wasn’t for variance, the better team would always win and sports could literally be played on paper. What is variance? Let’s go back to the original three coin example. There was only a 1 in 8 chance of losing the game. However, that means that 1 out of every 8 times, you will lose. Without variance, you would always get the expected outcome and never lose. It’s a big difference.

Soccer is especially vulnerable to variance as the relatively tiny number of significant events in a game means that each significant event has much larger value. One bad call or bad bounce can completely swing a game in a way you simply don’t see in other sports. It also means that determining the quality of a team requires factoring in whether variance has been kind or unkind. If a team has consistently been lucky or unlucky, that will likely change. This is regression.

Regression

When stats skew too far from the average, it is reasonable to expect regression. What is regression? Regression is just a fancy way of saying that something far outside what could reasonably be expected will eventually return to what could reasonably be expected. You may hear the term “deviation from the mean” associated with regression, which is just a way to tell how far away from average something is.

Numbers such as .067 goals per shot and .15 goals per shot simply are not sustainable. That is, you simply will not see them over the course of a full season. They are too far outside expectation. While things average out over a large enough sample, strange things happen in a small sample. The above table is a perfect example. Split up like that, there are a bunch of crazy numbers. Put it all together? The Union averaged .122 goals per shot on 12.5 shots per game and they allowed .124 goals per shot on 13.03 shots per game. It all comes back to the average, eventually.

Over the course of the season, there will be games or stretches where a team is way above what is expected and others where they are way below what is expected. That’s the nature of regression — it is simply a statement that teams will return to the average over a long enough sample. Ultimately, the best teams will have numbers that end up significantly above average and the worst teams will have numbers that end up significantly below average. But most teams will end up pretty close to the average. So if everything eventually returns close to the average, what separates teams?

Very, very slight differences. In soccer, teams that generate more or better shots consistently, or allow less or worse shots consistently, will be better. For teams in the middle, it will come right back to variance. Last season, the Union were lucky for a chunk of the season and then became very unlucky. They snuck into the playoffs. And ultimately, very small differences over the course of the season can build into large differences.

Conclusion

It is very difficult to do proper statistical analysis of the Union or MLS because the stats are simply not available. Many people have questioned why I disappeared last season and it’s because, quite simply, nobody likes the person screaming “REGRESSION AHEAD” when a team is playing well or saying “well, this was completely expected” when a team is playing poorly. And it’s even worse when, like in soccer, there isn’t concrete evidence for it.

Anyway, I hope you found this all interesting and informative, and may you dream with me for a day when this information is no longer hidden from public view.

2 Comments

  1. Old Soccer Coach says:

    On part two I might pass the quiz if it were easy and I took it right this minute.
    .
    The concept regression to the mean begins to make sense. Whether I will remember it is an entirely different issue.
    .
    The post-Nogueira 2016 season was always about the collapse of the defense. His absence meant they did not have eleven players in the side that could play the defensive system effectively, they only had ten because they lacked a defensively effective, or even semi-effective, attacking center midfielder with Barneta moved out from #10.
    .
    He was not an effective center mid and moved back to #10, and Bedoya was signed to provide a better midfield option than available from the bench. He may not create as well as we’d like, but he does work hard to defend.
    .
    The initial hole in the dike that crumbled the 2016 season happened in 2015 when Edu played in the USL Open Cup final and deliberately made himself into a worse surgical case than he already was, more than likely, by doing whatever he did to allow himself to cope short-term with the injury restriction.
    .
    Twice variance has meant he has not in fact regressed to the normal expectations of healing, if the criterion is playing for the Union itself. He did play for the Steel late last year, three times.
    .
    If Edu had been healthy in 2016, … . Etc., etc.

  2. Atomic Spartan says:

    Statistically, I am not worthy. Why couldn’t I have had you in high school instead?
    .
    The art and science of coaching involves minimizing bad luck and negative variance through planning, persuasion, proficiency and performance. The first Two of these four elements are the exclusive responsibility of management, and management selection has a lot to do with the selection of proficient players.
    .
    Hence the calls to make JC and ES accountable.

Leave a Reply

Your email address will not be published. Required fields are marked *

*