Friday, January 29, 2010

World Cup Age Fallacy?

Sorry for the delay, folks, I spent quite a while trying to compile some data for this post, which was very tedious, to say the least (USSD needs to get some interns to do that dirty work).  For those of you who aren't big fans of math, I apologize in advance for what is as mathematical of a post as I've ever put up, but I think it's still worth a read and isn't overly scholarly.  I want what I write to be accessible for everyone, so I've tried to be clear enough that anyone can understand my main points.

Anyway, without further ado, some numbers for you to digest along with your dinner.

----

I feel like for as long as I've been watching sports, there's always been this common belief that a young team most likely doesn't have enough experience to truly flourish, and that an older team might not be able to hold up over the demands of a season or tournament.  These two beliefs ultimately lead to the conclusion that there needs to be some sort of mix between youth and experience to create the most effective team.  The idea that a team of teens and 20-somethings need to be buoyed by a veteran presence for maximum success comes up fairly frequently amongst analysts and writers of all sports, with the World Cup being no exception.  And for a long time, I (along with many, many others, I'm sure) just accepted that as fact, not bothering to delve any deeper into a topic that seemed like just good common sense. 

Which brought me to today, when I was mulling over the latest batch of mock World Cup rosters and searching for something to write about.  I decided that I'd see if the numbers actually back up that fairly commonly held belief that the most success comes with a good blend of young and old.  So, I spent a couple of hours calculating the average age of the teams from the 1990 World Cup to the 2006 World Cup (choosing that launch point only because of the USMNT's renewed involvement in the tournament), while also calculating each team's points per game for their respective campaigns.  I counted extra time losses as 1 point, with wins as 3 points in all tournaments, despite the fact that they only counted as 2 points in the early 1990s.  I considered just using total points, in an effort to reflect the length of a team's run, but the results ended up actually being very similar, so I just stuck with PPG.

Now, I'm still not very happy with this analysis, and I plan on expanding upon it in an effort to get a formula that can take into account a team's quality (i.e. FIFA or ELO rankings) or other factors, but for now I'll just pass along what I've come up with.  If anything, I'm just looking at this as a spring board for some improvements through my own changes, and any suggestions any readers might want to offer in the area of statistical analysis.

With that, here is a scatter plot of Average Team Age against Points Per Game for the 144 teams that took part in the World Cup from 1990-2006:

Just from a first glance, there doesn't appear to be any strong trend within the data.  But since the naked eye is often not enough, I fit a linear regression to the plot:
 
 
With a line that's nearly flat and an "r" value of .0735, this regression can be summed up simply by saying that, basically, it doesn't explain much.  But this seems logical; after all, we wouldn't expect any kind of upward trend where older teams generally do better, or a negative trend that indicates that younger teams seem to have more success on the whole.  Based off of the previously discussed belief in a need for a good mix of youth and experience, my intuition told me that, if any trend existed, it would be a quadratic one, one that shows that the middle of the graph (from 26-28) is where the high point is, with the upper and lower ends tailing off in comparison.

So, I put in a quadratic fit, and was somewhat surprised at what I found:
The regression shows what I suspected, with a hump peaking around 27 and dipping off with the younger and older teams.  The equation for the fit is:

PPG = -37.17831 + 2.8522234*Avg Age - 0.0527811*Avg Age^2

Using some calculus, we find that the Average Age that gives the maximum predicted PPG value is 27.02 years old.  Just to give you something to think about, the average age of this predicted USMNT roster is 26.35, not too far off from the point of maximization.  But don't get too excited, because the numbers behind the plot tell the real story.

We can interpret the regression's R^2 value of .029844 to mean that the quadratic curve explains only 3% of the data.  So, in short, this plot says that there really isn't any strong correlation between Average Age and PPG.

So could it be that a fairly long held belief that you need a good mix between young and old on a roster has no factual support?  Well, no, not necessarily.  There are a lot of things that go in to this that are hard to quantify, and I might have been better served using caps instead of age to reflect certain young players who have unusually high amounts of experience (like Landon Donovan).  But on the surface, there really doesn't seem to be any strong support for any type of roster composition. 

I guess the point of this post is that maybe coaches, fans, and pundits shouldn't feel apprehensive about fielding a particularly youthful or veteran team when the World Cup rolls around, since there is little correlation between age and success.

Are there flaws in this analysis?  Sure.  I really just tried to throw this all together this afternoon/evening, when I would have preferred to have a couple of days to really think this through.  I'm not overly happy with it, but I think the findings are still somewhat relevant, if for no other reason than to provide some food for thought with June approaching and an alternative look at the roster selection process.

Hopefully, I can make some improvements in the coming days to make a more concise and well-supported case, but for now, I'll leave you to mull everything over.

And if you're one of those anti-math folks I warned at the beginning of the post, I commend you for taking the time to go through what may have been a less-enjoyable read.

And to all of you, thanks for letting me take this chance to let out the inner nerd within.

4 comments:

The Mathematician January 31, 2010 9:34 PM  

I am unsure as to how your statistics test the hypothesis you mean to test. When you look at average age, you are looking at only the mean age of a team, not its composition. As an example, take two (3 member) teams with ages:

Team A: 24, 25, 26
Team B: 20, 25, 30

From my estimation, team B should do better per the speculation, but neither team is distinguishable by the metric you are using.

I think a quantitative definition of 'mix' along with a regression based on that definition (variance would be an easy first guess) has a chance of getting you non-zero results.

USSD January 31, 2010 11:22 PM  

Thank you for commenting

Your point is a valid one, and as I said in the post, I was only hoping to create a starting point for improvement through my own realizations and the contributions of others.

As far as distinguishing between your A and B teams, I'm not so certain it would be necessary for what I intended. The hypothesis I was seeking to test (which I probably did not express clearly) was more so whether generally young teams or old teams perform worse than more "middle aged" squads; that is to say, it was supposed to be more so about general greenness (or grayness) rather than roster composition, but I can see now that I did not make that too clear. In that sense, the 24-25-26/20-25-30 distinction is not as important as the avg:23 years/avg:29 years distinction. While we do not get a detailed look at the composition of these respective squads, we can infer that they are generally comprised of younger and older players, respectively, since one or two outliers cannot seriously skew the data.

Even as I read what I just wrote, I still feel like it's not good enough, and I'd really like to hear your thoughts on how to quantify some other factors or create a more informative model. Your point of composition would be something I would like to incorporate in order to build a better model, and I think I might need to really think about what it is I'm trying to test to outline a clearer hypothesis.

I think I did a poor job of framing the hypothesis I planned on testing, for starters, but I still would like to improve upon what I started at some point soon.

Again, thank you for your input, it really is appreciated

soccerjerseys June 7, 2011 4:27 AM  

This article is novel,Custom Soccer Jerseys I really love it. I also have some ideas written in my article, you can have a look if you have the interest.brazil soccer jersey We can also discuss different topics.

eagles513 November 10, 2011 2:39 AM  

I was looking at you wbsite,Michael Vick Jersey which is wonderful, and noticed that you have a postcard published by my cousin, Mathew Gardner. I am currently researching our family history and wondered if you had Authentic NFL Jerseys Cheap the original postcard and any other information whih has not been published on your website.

I have some information which I would be happy to furnish if you wish to get in touch.Cheap Steelers JerseysLooking forward to hearing from you.
Randall Cunningham Jersey
Darryl Tapp Jersey
Ronnie Brown Jersey
Kurt Coleman Jersey
Reggie White Jersey

Asante Samuel Jersey
Brandon Graham Jersey
Nnamdi Asomugha Jersey
Dominique Rodger Jersey
Vince Young Jersey

Brian Westbrook Jersey
Mike Quick Jersey
Nathaniel Allen Jersey
Donovan McNabb Jersey
Stewart Bradley Jersey

  © Blogger templates Newspaper III by Ourblogtemplates.com 2008

Back to TOP