A repeating theme heard in performance analysis discussions is that “Performance doesn’t prove doping.” Of course this is true because proof, as an absolute, doesn’t really exist. Instead, most real-world evidence driven judgments are based on probability and not “proof.” From this perspective, this post will illustrate the concept of using a mathematical model to estimate the probability of doping as a function of performance.

The first step is to create a probability distribution for clean riders at the Tour de France:

To make the distribution some assumptions need to be made based on available information. In this case, the assumption is that on a prototypical Tour de France climb a couple of clean riders should be able to sustain about 6 W/kg. A second assumption is that riders need to be able to sustain at least 5 W/kg to make the team and survive the race. Choosing a Gaussian model (a bell shaped curve) a mean of 5.5 W/kg and standard deviation of 0.25 W/kg will generate a distribution that meets these assumptions/observations. (Don’t worry if you don’t like my assumptions, at the end is a link to a Google spreadsheet that you can manipulate them for yourself)

Next, the overall prevalence of doping can be estimated from the published literature: http://www.ncbi.nlm.nih.gov/pubmed/25169441

In this review, the prevalence is estimated from 14-39%. The mean would be 31%, but for the sake of giving cycling the benefit of the doubt I used 25%. Using, the distribution function above and number of clean riders we can generate the distribution of clean riders as a function of power.

As you can see this model predicts 1-2 riders above 6 W/kg and 1-2 riders below 5 W/kg.

Now the performance effect of doping needs to be considered. Ashenden has previously shown that EPO micro-dosing producing an increase of 10% in hemoglobin mass (the equivalent of 2 blood bags) is not flagged by the bio passport. http://www.ncbi.nlm.nih.gov/pubmed/21336951

Since O2 metabolism is the primary determinant of endurance performance it can be extrapolated from this study that a 10% increase in performance from O2 vector doping alone may still be possible. However to stay reasonably conservative I used a 5% benefit from doping.

So now a second probability function is made with the mean in creased by 5%. The standard deviation is unchanged. From the probability distribution and the estimated prevalence of 25% doping we can now generate the distribution of doped riders as a function of power and overlay that with the clean rider distribution.

This figure should start giving you some clue as to why it is misleading to say that “performance doesn’t prove” anything.

Finally, from the distribution models we can calculate the probability that a level of performance is likely to be produced by a clean versus doped rider.

As you can see based on this model and the assumptions above, the probability of doping at the 6 W/kg performance level on a prototypical climb is about 60%. The probability of doping increases to 80% at 6.2 W/kg, and approaches 100% at 7 W/kg.

**This model/example is meant to be illustrative of the concept**. With better input and more sophisticated modelling, performance could be realistically used as an indirect measure of doping probability. Triangulating across multiple indirect measures including biological, performance, and psychometric measures would likely improve our understanding of the true doping burden on sport.

If you would like to experiment with the model it is available at the link below. The fields to manipulate are highlighted in yellow.

https://docs.google.com/spreadsheets/d/1-YgGROA3ZPtZ_kycr4UIwdqOL1S85NRTpb3zDTQ3Tuw/edit?usp=sharing