Appendix for DataColada[39].

Derivation that if test-retest correlation for a dependent variable is r<.5,

subtracting baseline lowers power.


By Uri Simonsohn

June 17, 2015


Let’s consider a two-cell design, treatment vs control, with dependent variable: y

Let y2t and y2c be the means for treatment and control respectively in the after period.

Let y1t and y1c be the means for treatment and control respectively in the before period.



The between subject difference

(1) B= y2t - y2c


The mixed-design test subtracts the baseline

(2) M= y2t - y2c  - (y1t – y1c)




The expected difference is the same, E(B)=E(M),  because with random assignment we have E(y1t – y1c)=0


This makes sense, we don’t expect differences at baseline, so we expect the same with B or M


How about the standard error of B and M?


Let’s make things easy. Assume all variances are the same:

(3) VAR(y2t)=VAR(y1t)=VAR(y2c)=VAR(y1c)=V

(4) COV(y2t, y1t)=COV(y2c, y1c)=C


(note: because of random assignment COV(y2c, y2t)= COV(y1c, y1t)=0)


Recall the high-school formula for variance of sum of random variables:

(5) VAR(a-b)=VAR(a)+VAR(b)-2COV(a,b)


We want to compute the variance of the B (between) and M (mixed design) estimates:


VAR(B)=VAR[(y2t - y2c)]]



VAR(M)=VAR[y2t - y2c  - (y1t – y1c)]



Mixed and Between subject design have the same sample size and the same effect size, hence Mixed has more power iff its variance is smaller than Between’s.


For VAR(B)>VAR(M) we need


Which occurs if


Which occurs if



C/V, the covariance over the variance, is the correlation, so:


The Mixed design has a smaller variance and hence greater power iff r>.5