10. Beyond the Top 10: data presentation and analysis of 2013 hard court performance

How data presentation and analysis lead us to insights about players outside the top 10, including their ability and potential to reach the latter stages of big tournaments, and their consistency. How do you become a big-time player – grinding it out on tour or nailing it at a big tournament? Questions if not conclusions.

A) Introduction

Two weeks ago, this blog produced rankings for ATP players based solely on their performance on one court surface: an outdoor hard court. Each player was ranked according to the number of ATP ranking points per match earned (ATP PPM): in other words, all their 2013 ranking points on outdoor hard courts divided by matches played.

Empirically, we could see the emergence of a new Big Four – Nadal, Djokovic, Murray and Del Potro – as well as possible subsequent tiers of players.

The purpose of this post though is to shed some light on those outside the top 10, where analysis of players is pretty thin. Anecdotally, my sense is that there is sensible, more, and more rigorous analysis of players in the top 10, but less so of players outside the top 10.

B) The group of players to analyse

To decide which players to analyse, I mapped all players’ 2013 ATP outdoor hard court points (henceforth just “hard court”) against their 2013 ATP hard court points per match. This produced the following chart.

10 PPM vs total points all players Final

As you may know, both Nadal and Djokovic won a hard court grand slam tournament in 2013. In the process each will have earned 2,000 ATP ranking points: which is why both their overall 2013 points on a hard court total is high (how far right they are on the chart) as well as their points per match averages (how far “up the chart” they are).

For the purposes of this post, I then selected for analysis the players similarly grouped within the red box above: 21 players who have scored between 500 and 1,100 ATP points on a hard court in 2013 at an average of between 20 and 42 ATP points per match. According to the ATP rankings published on 14 October 2013, these players were ranked between 12 (Haas) and 52 (Tomic). These players are shown in greater detail here:

10 PPM vs total points 21 selected players

C) Context

First of all: what does the graph mean?

Players “above” the line of best fit in 2013 earned more ATP points per match played compared to their peers; those below the line earned fewer points.

So, in the chart above, although Almagro and Anderson have earned a similar number of hard court ATP points in 2013 (885 vs 1,040) Almagro’s average ATP points per match is significantly better than Anderson’s and hence why Almagro sits above the line and Anderson below.

D) Hypotheses

How could a player win more ATP points per match played compared to his peers? And then what insight might this reveal about the player ie what’s the point in being above the line or below the line?

Players could logically score a higher number of points per match in three ways:

1)      By progressing further in tournaments. ATP points for different rounds follow a geometric progression not a linear progression. For example, Player 1 and Player 2 might both play in consecutive ATP 250 tournaments, let’s say Brisbane and Sydney. Player 1 loses in the first round of Brisbane (earning 0 points) but wins 4 matches in Sydney, losing in the final (150 points). Player 2 wins 2 matches at each tournament (45 points each) before losing both times in the quarter finals. Both have played 6 matches, winning 4 and losing 2. Player 1 has scored 150 points, for an average of 25 points per match; player 2 has scored 90 points for an average of 15 points per match.

2)      By winning matches at events where more points are awarded eg grand slams and Masters 1000s compared to events where fewer points are awarded eg ATP 250 events.

3)      By having volatile results that combine progression to the latter stages of bigger tournaments with early round exits at other tournaments.

What’s the point in being above the line or below the line? Being above the line may mean volatile results and a marker for being inconsistent; it may also be the mark of a player able to threaten at big tournaments even if that player can’t find the winning formula week in week out. Are these players “ones to watch” in big tournaments?

Conversely, some players located below the line may earn a large number of points in total but not be a significant threat at large tournaments.

Analysis in the second part of this post will test these hypotheses.

E) Methodology

So, if we consider a player’s position in relation to the line of best fit as potentially important, it is necessary to measure precisely how far above or below the line a player sits. According to our hypotheses, the further above the line of best fit a player sits, the more he might have progressed at tournaments or been inconsistent. Measuring this distance will allow us to run correlations to establish the extent to which these hypotheses are correct. Accordingly, I ran a linear regression to establish how far from the line of best fit different players were located.

As you can see from both the table and the chart, of the players under analysis, Gulbis and Pospisil are the furthest above the line, Istomin and Anderson the furthest below the line.

10 Combined table-chart

F) Data preparation

For each of the 21 players, I then collected additional data to test the hypotheses. This data included the number of tournaments entered for each player, their best results, the number of first round exits and the standard deviation of the number of rounds progressed (a calculation that helps to measure consistency, of which more later). Accordingly, analysis will be based on the following table:

10 Selected players table

G) Analysis vs hypothesis

1) & 2) Earning more points; earning points in bigger tournaments.

There is a strong correlation (coefficient 0.56) that those located above the line are more likely to progress to the latters stages of tournaments, and especially big tournaments. Only players in the first seven in the table made a grand slam quarter final (four players), worth 360 points. Also in the first seven, Gulbis won the title in Delray Beach in Florida (250 points); Pospisil made a run to the Masters 1000 semi final in his home country, Canada.

By contrast, the best event for players lower in the table was typically to an earlier round, for example a round of 16 match at a grand slam. Or, as in the cases of Dodig and Querrey, modest returns from their best tournaments of, respectively, a semi final in Tokyo (an ATP 500 event) and a third round appearance at the Miami Masters 1000 event.

For an explanation of correlation coefficients, see the Glossary below.

3) Consistency

Just as the players above the line correlate with progress in large tournaments (grand slams, Masters 1000 events), so they also strongly correlate (coefficient 0.64) with inconsistency, especially a propensity to be knocked out of tournaments in the first round.

The first 6 players in the table above – ie all significantly “above the line” – played a total of 47 hard court tournaments and were knocked out in the first round 21 times (45 per cent of the time). The last 6 players in the table played 72 tournaments and had first round departures just 13 times (18 per cent).

Jeremy Chardy of France is a very good example of an “above the line” player: he reached the Australian Open quarter finals in January (beating Del Potro en route) but also racked up 5 first round losses.

Jurgen Melzer’s inconsistency was somehow balanced out: he is just 1 unit below the line of best fit. Among 11 tournaments, Melzer had 6 first round losses, the most of any player in the table, but also a tournament win in Winston Salem (just prior to the US Open) and a quarter final in the Miami Masters 1000 event. (Melzer also won a lower level Challenger event in Dallas the week prior to the Miami Masters 1000.)

Standard deviation

I also ran a standard deviation for the number of rounds progressed by the first 3 in the table and the last 3. As explained more fully in the Glossary below, standard deviation in this post essentially demonstrates how consistent or inconsistent players are. A standard deviation of below 1 suggests that a player is consistent (whether consistently good or bad), while a standard devaiation of above 1.5 suggests that a player is more erratic.

Gulbis, Pospisil and Almagro had standard deviations of respectively 1.9, 1.5 and 1.5 rounds; while Haas, Istomin and Anderson’s standard deviations for rounds progressed were 1.2, 0.9 and 1.3.

4) Other analysis – Number of tournaments

This one is not like the others.

The stark correlation in the table (coefficient of 0.87) is that those above the line correlate with a lower number of tournaments played, while those below the line correlate with a higher number of tournaments. As noted above, the first 6 players in the table played a total of 47 hard court tournaments, the last 6 played 72.

The last 6 players – Nishikori, Dodig, Querrey, Haas, Istomin, Anderson – played in more tournaments and had fewer first round exists between them than any other group in the table. As we have seen, in general these players did not progress as far in tournaments as the first 6 but in addition to far fewer first round exists, also appeared to be more consistent with lower standard deviation scores (in addition to the other 3, Dodig’s standard deviation score was just 0.8).

Relative to the other players in the table – and conceding that these players are still mostly top 50 and even top 15 (Haas) – have we stumbled upon the graphical representation of the tennis journeyman? I brace myself for Haas and Nishikori fan discontent….

What came first, the music or the misery?

However, if consistency coupled with middling success (compared to the other players in the table) correlates with a high number of tournaments, then this raises more fundamental questions about cause and effect:

  • If the geometric structure of ATP points reward those that reach the latter stages of tournaments, does the middling success of the above six players force them to play more tournaments to earn ATP points?
  • Or do these players by nature play tennis week in week out, the effect of that amount of tennis being to dull any “edge” they might have, thus allowing them to reach only middling stages of tournaments?

I’m not sure I have an answer right now to this.

To quote John Cusack’s character in the film High Fidelity: Did I listen to pop music because I was miserable? Or was I miserable because I listened to pop music?

Are tennis players journeymen because of the number of tournaments they play? Or do they play a high number of tournaments because they are journeymen?

10 Cusack

Would Jack Black be above the line or below the line?


Data presentation and analysis has uncovered interesting traits about a group of similarly performing players. I hope it has also shown how data can and should be central in how we go about understanding different players.

In the particular sphere of the 21 players analysed in this post, it seems we can and should expect to see more in the latter stages of tournaments of the likes of Gulbis, Pospisil and Almagro – even if we might also miss them if we don’t turn up on the first Monday of a tournament…


Correlation coefficients

A good explanation of correlation coefficients can be found here.

Essentially, the coefficient is any number between 0 and 1. The nearer the number is to 1 the better the correlation. 0 means an absence of correlation; 1 a perfect correlation. Anything above 0.5 is meaningful; anything around 0.75 is El Dorado.

Standard deviation

Standard deviation essentially measures how far different points are from their mean average. A decent explanation is here. For our purposes standard deviation is being used to show the extent to which different tennis players are inconsistent in different tournaments. I used it to measure players’ progress through different rounds.

A low standard deviation (eg Dodig 0.8) means that a player generally reaches a similar stage of the tournament in each of the tournaments he enters; a high standard deviation (eg Gulbis 1.9) means that a player has in all likelihood regularly reached finals and been knocked out at the first round stage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s