This quantity is often misunderstood and/or misrepresented. Let’s assume we have some random variable with mean and some estimator for it . The commonly accepted fundamental definition of is:

Sometimes people call this the “fraction of explained variance”. But that’s only the right way to look at it under special circumstances. All the equation above shows is that “compares” the model to the model . It’s a little like comparing a random walk model to something someone clever cooked up hoping to beat it. Note immediately that there’s nothing stopping from being negative, and so the square in notation is unfortunate in that respect. If you choose a bad enough model, say for any with , well then will be negative*.

But the comparison made here is not simply comparing the errors of the two models. Instead it compares two variances (which harkens back to Principal Component Analysis, and I will update this blog with that discussion) and then subtracts that ratio from one. If the model for has the property that the errors are zero on average, then the numerator is proportional to the variance of the errors. The denominator is proportional to the variance of . So, if errors are denoted by , that ratio is just . So whenever the variance of errors are greater than the variance of , ends up in negative territory.

In terms of this fraction of explained variance interpretation, that turns out to only be the case when , which follows from simple relation below:

Although that all seems obvious, given the simple equations above, it’s actually *not* always done in mainstream practice. For example, the statsmodels package in Python gets this mixed up. Try running sm.OLS(Y, X, const=False) and checking out the excellent . Better yet, check out statsmodels VIF calc, which makes the same mistake. I will add more decision around that to this blog.

* To see that any constant guess other than the mean itself will result in a negative R squared, consider one such guess as . Then we have,