For quite a few years now, a dedicated community has been working to build a toolbox of analytical and statistical techniques for analyzing the game of soccer. A lot of this work has focused on major European leagues where a lot of data is publicly available. AmericanSoccerAnalysis is trying to bring that toolbox to MLS. They have developed an impressive Expected Goals model at the team level, the individual level, and the game-by-game level. Additionally, they use publicly available numbers to try and get a sense of what drives success in Major League Soccer.
Harrison Crow is one of ASA’s founders. PSP talked with him about the Union’s start to the season and the development of MLS analytics.
NOTE: Expected Goals, and analytical analysis in general, can be hard to grasp. If you think you’ve already got a good handle on it: Awesome! Skip ahead! This is just going to be a short attempt to explain the value of something like Expected Goals.
Expected Goals models are just supposed to give a general idea of whether a shot will be a goal. It’s like an explicit version of what you do mentally when you play HORSE on a basketball court. If you want to be sure you make your shot, you take it from a place on the court where you are pretty sure you can make it. If you want to make sure your opponent has trouble matching your shot, you might take a difficult shot even though you have a low chance of making it yourself. In essence, you are creating a mental calculation: Given how good of a shooter I am, and how far away I am from the basket, and the angle I am compared to the backboard, and whether I’m on the run of stationary, and whether I shoot a hook shot or a set shot… how likely am I to make the shot? And then you have to decide whether it’s worthwhile to take that specific shot.
Expected Goals, and analytics more generally, try to do the same things you are doing in your head during HORSE. Based on where a shot takes place on the field, what body part was used, what type of play led to the shot (e.g, throughball? Cross?) and a bunch of other variables, Expected Goals estimates the chances that the ball will end up in the net. Individual players don’t really take enough shots to allow separate models for each guy, but if you add up all the shots in a league over a few seasons, you can see that certain types of shots from certain places of the field have a much better chance of ending up in the net. So if a team scores a bunch of goals off shots from situations that have a low chance of resulting in a goal, they “outperform” the model. And, unfortunately, they aren’t likely to keep scoring unless they start creating better chances.
It’s probably important to remember that a lot of analytics and statistics is descriptive, meaning it’s just meant to tell you about what is going on in soccer. This is different from a normative perspective which is about telling you what you should be doing (so it’s what are you doing, instead of what should you be doing). Looking at numbers can do a couple of things: First, it can verify what you’re seeing with your eyes. Second, it could contrast what you’re seeing, and make you ask whether you need to change your perspective or whether you need to change the way you’re analyzing the data because the results it’s giving you don’t make sense once you put them in context.
That’s the short, “What Can Numbers Do For Me?” intro. On to the interview.
PhillySoccerPage: Looking at the underlying numbers for Philadelphia Union so far this season, how much of their success seems like it’s driven by good play? Or does it seem like luck might be playing a significant role here?
Harrison Crow: They have been lucky. The two goals from Pontius, not to take away credit from him, but it’s a matter of being at the right place at the right time. And that’s part of Pontius’ game, but at the same time, you’re not going to continue to facilitate those goals in that manner.
A couple things that stand out to me in terms of shots: They are managing a lot of shots in the penalty area, and we certainly subscribe to getting high leverage shots. The Union are sixth in MLS with almost 8 shots inside the penalty area per game. And that’s grand. However, the majority of those are headed shots, which really limit the likelihood of goals being scored. So it’s not just about where you’re taking shots, but how you’re taking shots.
Aerial shots, outside of Kei Kamara and five or six other guys, are generally not of consistent quality. You’re not going to see someone continue to convert a high number of those chances as you might if they were shots with the feet.
That said, they are about average as far as long passes and crosses, which is a good thing. That isn’t usually an indicator of a consistent, manageable attack. It’s direct, and often seen as a last-ditch effort where you throw guys into the box and see what happens. And they’re not doing that.
But then they also are not creating opportunities for CJ Sapong and the wingers to get down and be creative either. They aren’t generating throughballs, which is a little disconcerting. On top of that, they have very few ‘dribbles past,’ so they aren’t getting past the back line. Whereas, if you look at guys like Obafemi Martins or Cyle Larin, guys that are consistently on teams that have a high conversion rate, it’s because they play on teams that often score from counterattacks, and that conversion rate comes from a lot of high leverage shots.
So in summation of the shot data, it’s not bad so far, but the Union’s ability to continue to turn the shots they are getting into goals is a bit suspect at this time.
PSP: Let’s talk about a different stat. Last year Fabinho had a very high ‘usage rate,’ and this year but Fabinho and Keegan Rosenberry have high usage rates among fullbacks getting regular minutes. What, if anything, does that indicate? Can we draw and conclusions here, and if so, can we apply positive or negative labels to those conclusions?
HC: Not really. Usage rates is about style. And fullbacks often have high usage rates because they’re the ones sprinting forward into attack, they’re the ones working with central midfield to provide width.
So I think it’s more stylistic. Looking at just touch percentage [league-wide], you can see it’s mostly central midfielders with fullbacks sprinkled in. For these guys, they end up being sort of point guards or facilitators for the rest of the team. And being that the majority of Philadelphia Union’s passes are short, those numbers aren’t a surprise because the fullbacks are trying to provide width to get the ball behind the opposition back line.
PSP: So it’s potentially an indicator of style? Last year the Union did go long a lot, they had a ton of trouble getting the ball out of their final third. Lowest possession rate in the league, near the bottom in pass completion rate as well. So the fact that both fullbacks are seeing a lot of the ball this year, what can it say about style of play?
HC: I think it says that they are a counterattacking team. But also that it’s disconcerting that they aren’t getting the ball forward to start that counterattack. And this analysis is based purely on what I see in the numbers, but it seems like they sit deep in their defensive third. They are sixth highest in passes in their own defensive third. So they seem to be a counterattacking team stylistically, but without some of the markers of the top counterattacking teams like Dallas: Longer passes and dribbles past.
So I am skeptical about the Union continuing to generate chances.
Also, the defense has been really leaky. If you’re going to spend the majority of your time defending — and Seattle does this, they have a tendency to give up a lot of shots, but they seem to give up bad shots. Philadelphia gives up a lot of shots, but they are mostly good shots. And they’re surviving based on Andre Blake. Blake has 11 diving saves this season. That’s five more than his next closest goalie. I don’t know, necessarily, how that relates to his performance, but the most diving saves in the league the past few years has been in the 50s, and he already has eleven through three games. That doesn’t say a lot about your defense.
PSP: Let me ask you more about Blake. Aside from luck, many would argue that it’s Blake driving the team’s success so far. Right now [small sample size noted] he’s on course to face 40 more shots than anybody faced last season, and he’s outperforming your Goalie Expected Goals model by two goals, and Nick Rimando put up the highest number last year with a little over nine throughout the whole season. Do the numbers suggest that Blake is doing anything that makes him an outlier? Is he just having an active start to the season? [Note: Goalie performance is notoriously difficult to quantify. AmericanSoccerAnalysis has built one of the few Goalie Expected Goals models, and they have made it available here. The basic idea behind expected goals is that, with enough data, you can estimate how influential a bunch of variables will be on the likelihood that a shot goes in or is saved. The ASA goalie model takes into account how the shot was generated (from a corner kick? A direct free kick? Fast break? Through ball? etc.), where the shot came from, and a bunch of other stuff. After a game, you can put all the data into the model and the model goes, ‘Ok, based on what you told me, I expect the Union to have given up 1.436 goals in that game.’ If the Union only gave up one goal, they outperformed the model. It’s important to remember, though, that there are a lot of things the model doesn’t know about because we don’t have the data to give it. So the model is really just a pretty good guess based on a) The parts of the game we think are important, and b) Our ability to turn those parts of the game into numbers.]
HC: It’s still really hard to say how good a keeper is now [with numbers alone]. We don’t really have adequate metrics that can define a keeper. Blake doesn’t have a lot of punches, his long passes are last in the league in terms of maintaining possession…
PSP: They look bad too.
HC: …so he’s not helping himself. But sustainability aside, he’s been huge. Whether or not he can continue that performance, I don’t think anyone can really know that. It’s too hard to say. And they’re giving up a huge amount of shots on target. Well, forget about shots on target: They’re giving up a huge amount of shots. And that, added to the fact that those shots are from dangerous locations, Blake just flat out won’t be able to stop all of them.
Two years ago, with Bill Hamid, DC was pretty much the worst team in MLS history in terms of giving up shots. Without Hamid, I’m not sure they would have won a game. And DC United, the past couple of years, has still been that terrible team. But it’s made Bill Hamid worth that much more.
And so far, Blake has been somewhat in that vein. So maybe Philadelphia has an outlier goalkeeper. If you look at teams that consistently beat our model, it’s teams with really elite goalkeepers. The Rimandos and the Hamids.
PSP: One of the things that has always bothered me about defensive metrics is that I have never been able to figure out what they’re telling me. Fabinho is my prime example, where he always has a ton of interceptions, but also seems to spend a good amount of time out of position. And once I accept that interceptions might come from good positioning or bad positioning, and it can’t distinguish between the two… what am I left with?
HC: Sure. I think you’re nailing it. One thing you have to do is adjust for possession when talking about defensive actions. That’s one thing that’s really big: Dom Dwyer is usually pretty dominant in terms of defensive actions for forwards, but Kansas City uses a really high press system. Same with Sacha Kljestan, Dax McCarty, and all those midfielders. They’re constantly attacking the ball. It’s a little different than looking at Diego Chara, Ozzie Alonso, or Matias Laba. They get less touches, so what they do is a bit more impressive. So it’s about context.
If you have less time on the ball and you’re stealing more passes, that could be because you don’t have the ball so there are more passes out there to steal.
But yeah, it’s hard to account for how frequently someone is getting out of position. But that being said, I think you can look at two things. You can look at fouls, and you can look at offsides. How often does a player manage to coax a player into being offsides and how that’s attributed. I think that if you’re fouling a lot and you aren’t pulling your opponent offsides a lot, you’re probably getting beat and probably out of position a lot. But that’s sort of intuitive.
I looked at center backs, and one of the things I looked at was: How often are they fouling? And you can also look at players and aging in that context as well. How often has Kyle Beckerman over the years upped how often he has fouled? Everybody kind of looks at him as being almost a dirty player now, but the reality is that he’s probably lost a step. And a lot of the tackles that he was winning three years ago he physically isn’t capable of winning now, but the mind still says he can.
PSP: It’s like the Larentowicz Curve, where he gets older but is still intent on going into every tackle, so he commits more fouls?
HC: Well, he’s an interesting guy in and of himself.
PSP: Oh yeah? Tell me about Larentowicz.
HC: I actually thought the Union moving for Larentowicz might be a good thing just because he’s a great passer from the back, he wins a lot of aerial duels, and he doesn’t foul a lot. Those are all three big pluses. And they’re also underrated things when you talk about a defensive player. Because most people look at defensive actions and he’s not that busy in that area.
Matt Besler is another example of that. He consistently leads the league in key passes for defensive players. So that probably says something about style, but also something about his great, accurate long passes.
This offseason, we did a model internally, and he chose to try and “fix” the Philadelphia Union. It’s something we worked on a lot this season.
And somebody we specifically targeted was David Horst. We love [Houston Dynamo performance analyst] Ollie Gage, but we feel like they underrated David Horst. Maybe they’ve changed that coming into this season, though. Jermaine Taylor was someone else we considered for Philadelphia.
Overall, we felt like Philadelphia needed to add somebody with intelligence and, not to rely on cliches, but somebody with experience. The Union have some very good young players, and it’s nice to see them playing, but you need somebody with experience who can teach. They didn’t get someone like that this offseason, and I think that was a bit of a swing and miss.
PSP: One of the things Ted Knutson of Statsbomb used to do was sort of a “breakout potential” chart of young players that would look at progression across seasons and point to players with a higher likelihood to take a big step in the upcoming year. Is that something you guys have considered?
HC: I have my own theories with age progressions, particularly with Americans, but nothing that’s been substantiated or tested. I generally believe, and I think Michael Caley [of Cartilage Free Captain] has done some work with age. And if not him, maybe it was Paul Riley, done something with minutes and ages. Minutes can, at least, imply some level of skill. Teams aren’t going to play guys that are performing so badly that they aren’t even average.
But players in the US don’t usually see a full season of minutes until the early- to mid-20s, but you see them continue to get those minutes past the age of most European players. And that may be because they don’t put as many minutes on their legs at 18, 19 years old. Europeans, obviously, start a lot younger. And that throws off aging curves. So when you look at Clint Dempsey, he could play awhile.
So basically, you’ll probably see Americans bloom later and last longer. They won’t have longer primes, but their window is shifted.
PSP: That puts pressure on the academies then. Philly brought Zach Pfeffer through when he was 16, played him all over the pitch, then got rid of him to Colorado…
HC: I’m still a little pissed about that trade. I’m sure there are Union fans pissed about it. I think Pfeffer is probably a good example. I thought this year would be a breakout year for him. He’s a young guy that’s been around so long. This could be a year where you give him 2000+ minutes and see how he performs.
I really don’t know how Colorado is going to integrate him, but he’s a prime example. He was brought on younger yet hasn’t had a full season of minutes. Tommy Thompson in San Jose, as well.
PSP: For me, the interesting comparison has always been the Perry Kitchen/Amobi Okugo comparison. DC brought Kitchen in, stuck him in the lineup and let him work the kinks out as he grew. Philly continually brought in guys who played Okugo’s position, and you can see where they both ended up.
HC: I thought that could also have to do with attitude and work ethic. I think work ethic often gets misunderstood. I think there are players who are very good who don’t have to work as hard as other players. Oniel Fisher in Seattle: He’s way more active than Tyrone Mears is, but it’s because he makes more mistakes! Mears just does things right the first time. Sometimes we overrate hustle and grit and silly cliches because they are cliches, and they’ve been ingrained in us. And we work hard, so when we see a guy running up and down the pitch: Up and down, up and down, that guy bleeds for the club.
Well, just because a guy doesn’t necessarily do that doesn’t mean he isn’t a worker. There are other things that take effort. Even the effort not to run around. You look at Jermaine Jones. You just wish he had the patience to settle down and be in one spot and disciplined. It’s hard.
PSP: For Philly, this plays out with Maurice Edu. You have a camp that wonders if Maurice Edu should be doing more, if he’s lazy, because he’s such a great athlete. And then you have others who can argue that he’s trying to learn to stay in position and not go chasing the ball around the field. And that’s a very divisive argument: Is Maurice Edu lazy, or is he intelligent?
HC: I would argue he might just be a bad soccer player. And that probably won’t go over well with a lot of people. But he doesn’t make a lot of intelligent tackles, he commits fouls in areas that are consistently bad for his team, and he doesn’t contribute much to the attack. So it’s very frustrating…
I think he could be a pretty good center back. He’s pretty good aerially, which is something the Union need. But he’s consistently out of position. I think over the long term… You’ll see games where he wins a lot of aerial duels and those where he just doesn’t. And it’s because he’s out of position when he’s trying to make plays.
I think he battles against himself at times. I definitely don’t think he’s worth as much as Philly is paying him.
PSP: There are so many examples of these “effort” guys in Philly, where you ask questions about whether they’re all work, or if they are learning positioning and how to approach the game. Ray Gaddis is an example…
HC: We have had similar arguments about Ray Gaddis. Kevin Minkus did a couple of really cool graphics showing where Philadelphia has shot against versus how many defensive actions take place. And the big hole is right where you’d expect the right back to be. So that tells us he’s either tucking in too far or getting caught forward.
We’ve had some speculation about that. I’m sort of the Ray Gaddis supporter of the group, though I think this is a bias that I have.
Le Toux is sort of interesting. He collects a lot of events, so it’s hard to understand with just numbers: Is he a good player or not?
PSP: With Ray Gaddis, the question for me was always: How do you control for the center back’s positioning? Early in 2015, there was a period when Ethan White was next to Gaddis and teams were just able to lump the ball into the corner and the striker beat White to the ball so often, just far more often than you normally see in a game. And from then on, I’ve wondered about how to decipher the relationship between the fullback and center back.
HC: And I think that’s where it’s hard to rate players. Fouls committed can sometimes be a misnomer for your defensive partner getting out of position. It doesn’t necessarily imply that you are bad.
PSP: I remember a year or two ago someone at your site, I think it was Kevin [Minkus, who has also written for PSP], put up a post where the question was really: How in the world are DC United surviving when all the numbers suggest they’re terrible?
HC: Coleman Larned might’ve done that. He’s done a lot with tactics and Expected Goals. Expected Goals is a great metric, and it has stability now, which is nice because baseball has that and soccer, largely, does not.
But that said, you have to apply it pretty judiciously. You have to understand that it comes with context and there are outstanding circumstances, like Obafemi Martins constantly beating his Expected Goals: there’s a reason for it. Cyle Larin is probably not going to sustain his scoring because he’s probably not going to keep getting the number of counterattacks that Martins did. So understanding why one player beats the model while another can’t, requires going a bit in depth.
And, honestly, a lot of what we’re doing probably doesn’t get to the level of analytics. We’re starting to get to that point because there’s finally enough data, but it’s really just about being smart about how you dig through your data.
PSP: Are you in the camp that thinks when the player movement data becomes public it will be revolutionary? [Player movement data is exactly what it sounds like: Public data that describes where a player is on the field at any given moment, how fast he got there, etc]
HC: It’s going to be revolutionary… probably. Just because it’s going to give us more information. It’s going to facilitate a lot of projects and a lot more analysts, and claims, and correlations… that probably aren’t true.
As soon as the floodgates drop, we’re going to try and figure out what it all means. But realistically, it’s taken four years for us to get to where we are with Expected Goals, to where we can kind of start to understand some of what it means, but not all of it.
Player tracking data will revolutionize stuff, but it’ll take us a good eight years to really start to use it. This isn’t the NBA, we don’t necessarily know how to associate certain values with other values, and what has relationships and what doesn’t.
Maybe if we get 20 years of old movement data, that speeds things up a bit. But getting one year of movement data isn’t necessarily going to tell us very much. It’s going to bring out some hypotheses, and it’s going to bring out new questions. And that’s how you learn: Have a question, and find out you’re wrong. That’s how we’ve learned about Expected Goals, that’s how we’ve learned about defensive metrics.
PSP: So you fail, then you learn from that failure, which gets you more information. And eventually, a consensus slowly develops. But for something like sports analytics, which is developing in public and has it’s fair share of public skeptics, the method itself can seem like validation to detractors.
HC: You can’t convince people of things they don’t want to be convinced of.
Trying to use more information to make a better decision, which is what analytics is at its very core… there’s nothing wrong with that. And it’s trying to answer: What are best practices?
PSP: As analytics and statistical analysis in general become more accepted in soccer, it could be seen to be less of a competitive advantage to teams. Like: ‘Everyone else is already doing it, so what’s it going to tell us that’s special and different? We should invest resources elsewhere.’ But you’re saying that even when we have a bunch of data, it’s still going to be more about gaining an advantage by having smart people looking at the numbers.
HC: I don’t necessarily think you’re ever going to reach an end. We’re never going to stop evolving Expected Goals. Baseball has been doing this the most, and analytics is still a competitive advantage. You look at the Oakland Athletics, who have been doing this for years, and forget about Moneyball: Look at what they’re still doing. They came in last place last year, but before that they had won the division three years in a row. They’re still trying new things.
The soccer marketplace for acquiring talent is always changing, so you always have to change how you’re assembling your team.
Going into this offseason, I really thought that trading Cristian Maidana and acquiring Chris Pontius were terrible ideas. Especially for a general manager that really thought through opportunities and looked at undervalued players. Cristian Maidana is probably the most underrated shot creator in all of MLS. So for them to trade him away — and they got a good value for him — and not even use him to get a player that I thought would upgrade them in Brandon Vincent.
Brandon Vincent was a guy that I thought was, without a doubt, the number one fullback. And instead they went with their heart. And Keegan Rosenberry has looked really, really good. He’s better than what I gave him credit for. But Brandon Vincent is obviously a game-changing left back. And in a league that has very few of those, that’s precious.
And analytics speaks to all these things. It’s not just about what happens on the field, it drives your decision-making on a daily basis. Offseason, in-season, player selection… it just helps you make better decisions. That’s the whole goal.
And to go back to your question, I don’t think we’re ever going to run out of avenues through which smart people can use analytics to help teams improve their decision-making.
PSP: I’m glad you brought up baseball, because one of the interesting things that has been happening in baseball is the quest to find that ‘one statistic to rule them all.’ Right now, WAR is that number… [WAR=Wins Over Replacement: A number that is supposed to represent how many more wins a team would have if they had Player X on their team compared to having a league average player on their team. So if the Phillies had a good player, he might be worth 5 more wins over the course of the season than an average player.]
HC: Right, you want to know what the difference is between a goalkeeper and a striker? How do they both influence your team winning games.
PSP: Is that something that the analytics community is pursuing now? Is that so far in the future that it isn’t even on the radar?
HC: I’m not entirely sure. I don’t subscribe to the Audi, Castrol Index thing. A lot of that is arbitrary, and they’re only focused on specific details. What they do take in, they aren’t super-transparent about it. So it makes it difficult to buy into it, and we’re already seeing things on it that don’t match the eye-test.
And the thing about WAR: You can see Mike Trout is the best player in baseball. There’s no doubt. He’s the next Mickey Mantle. Numbers shouldn’t necessarily, every time, completely go against our instincts. Sometimes they do, and that’s the battle: Are my eyes wrong? Are the data wrong? How do I adjust?
But most of the time, the data should supplement what you’re seeing, it should give a better perspective. So with that being said, I think an overall statistic to ‘rule them all’…
I think until we really understand what we’re looking at in defensive metrics… That’s what I’m hesitant about. We can’t really say what makes, on a statistical basis, what makes Bill Hamid special, what statistics have a relationship to him preventing goals. Then we have to be able to say, ‘OK, how does that compare to Josh Saunders? How does that compare to Dan Kennedy or Jesse Gonzalez?’ Those are questions we fail to answer right now among peers, so trying to take them apart and cover various positions is overreaching right now.
But we’re always playing with stuff.
Overall, I think that statistics and analytics with MLS has kind of stalled a bit. And it’ll be interesting to see what happens over the next year and a half, two years.
Great article, maybe one of the best I’ve read here in a while.
And this on Edu:
” Is Maurice Edu lazy, or is he intelligent?
HC: I would argue he might just be a bad soccer player.”
I’ve been banging this drum for a while. As for being lucky, I agree we have been, but it sure beats the previous years where we’re both bad and unlucky.
Edu bad. Yes. Overpaid. Yes. Not available till September. Probably. Good idea to even think about relying on him to produce this year? Nope.
“Bad” no. Overvalued yes.
Andre Blake, eleven diving saves, five more than his nearest competitor. Sounds like player of the Month to me. 😉
… cause we’re putting it on wax….. its The Neeeeewwww Style.
.
shout out to The Beastie Boys.
The Union had a victory against New England that was quite comprehensive. They had a loss against Dallas that was equally comprehensive (in the opposite direction). In between was the Columbus game, where I said that I thought the team did not look good, and barely deserved a point, and instead they got 3. Pontius’ goals, marvelous and well-taken though they were, were something of a fluke. And the Union have, indeed, been rising Andre Blake’s back this season. I agree with your analyst that they are going to need more than that to remain competitive this season.
There’s a lot of interesting information here – both fact and opinion. Definitely worth reading. Thanks.
The interview does not mention a point of which all of us are acutely aware, that Nogueira and Barnetta were not on the pitch for either win. Would that questions had been asked about each player.
I had the same thought. It would be great to see a comparison of the first three games and the next three (assuming all but Edu return to the lineup).
This has been my contention all along… good call MD.
Confidence in soccer is something that is critical. What I takeaway is that the Union are playing a new brand of soccer – more possesion and short passing. They still have a ways to go – final third passing, finishing, defending, etc.. We are only three games in and they look much more confident than last years team. The results so far may have been fortunate but a confident team is one that hopefully can continue to evolve and get better!
I enjoyed this article and the ideas it presented, but soccer analytics to me only have some uses. I don’t think you can rate players based on them, at least not without watching a lot of tape because so much of the game is situational and based on what other players are doing. Now using it to see locations and types of goals/legit scoring chances should be used to formulate game plans.
.
Also, the idea that the Union have been “lucky” so far this year is absurd. 3 games is not enough of a sample size to tell anything really.
“Le Toux is sort of interesting. He collects a lot of events, so it’s hard to understand with just numbers: Is he a good player or not?”
For Chrissake, he’s in the 50/50 club…can we stop with this? He’s a weird player, he has a limited set of things he can do, but if you let him do those things he scores and creates goals.