To fit two straight lines we need to identify the point where one line ends and the other begins.

In the post we consider using the quadratic everyone is using for testing u-shapes to do so. We provide instructions for doing so below.

There are other options for identifying the interruption point that do not involve the quadratic, and may have some advantages, we do not consider them here.

Six Steps

Step 1. Plot the raw data.

Step 2. Estimate the quadratic regression y=a*x+b*x2

Step 3. Verify the results imply a u-shape within the range of observed data (e.g., plot the quadratic)

Step 4. If u-shape is seen, identify the value of x where u-shape maxes out. Call that point xmax. Simple calculus gets you that that point is

xmax= -a/2b

Step 5. Create new variables to allow an interrupted regression.

We will have a variable for low values of x (xlow), one for high values of x (xhigh) and a new dummy (high).

xlow =x-xmax if x<=xmax, 0
otherwise

xhigh=x-xmax if x>xmax, 0 otherwise

high=1 if x>xmax, 0 otherwise.

That sounds more complicated than it is, imagine x=1,2,3,4,5,6,7,8,9,10 and xmax=5, the data would look like this:

xhigh=x-xmax if x>xmax, 0 otherwise

high=1 if x>xmax, 0 otherwise.

That sounds more complicated than it is, imagine x=1,2,3,4,5,6,7,8,9,10 and xmax=5, the data would look like this:

x | xlow | xhigh | high |

1 | -4 | 0 | 0 |

2 | -3 | 0 | 0 |

3 | -2 | 0 | 0 |

4 | -1 | 0 | 0 |

5 | 0 | 0 | 0 |

6 | 0 | 1 | 1 |

7 | 0 | 2 | 1 |

8 | 0 | 3 | 1 |

9 | 0 | 4 | 1 |

10 | 0 | 5 | 1 |

Step 6. Run the new regression y=c*xlow+d*xhigh+e*high

If c and d have opposite sign, there is a u-shape.

If both are p<.05, there is a statistically significant u-shape.

Sample R Code

#DEMONSTRATION OF FITTING THE TWO LINES TO A QUADRATIC

#Generate x

x=sort(runif(n=300))

x2=x*x

#Genererate error term

e=rnorm(n=300,sd=.05)

#Generate y

y=x-x2+e

#Step 1 Plot data

plot(x,y)

#Step 2 - Run the quadratic

bs=summary(lm(y~x+x2))$coefficients

#Step 3 - Find the point where it maxes out, which is -2b/a where

# y=ax+bx2

a=bs[2,1] #This is the linear coefficient in the regression results

b=bs[3,1] #This is the quadratic coefficient in the regression results

xmax=-a/(2*b) #This is the point where the (inverted) u-shape takes its (maximum) minimum value

#Step 4- Generate new predictors with interruption at xmax

#xlow=x-xmax when x<xmax, 0 otherwise

xlow=ifelse(x<=xmax,x-xmax,0)

#xhigh=x when x<xmax, 0 otherwise

xhigh=ifelse(x>xmax,x-xmax,0)

#high dummy

high=ifelse(x>xmax,1,0)

#Step 5 - Run the interrupted regression

lm1=lm(y~xlow+xhigh+high)

f1=fitted(lm1)

#Plot the two lines

plot(x,y,main="Two Straight lines",cex.main=1.5,cex=.75)

lines(x[x<xmax],f1[x<xmax],col='red',lwd=3)

lines(x[x>xmax],f1[x>xmax],col='red',lwd=3)

lines(x,fitted(lm(y~x+x2)),col="blue",lwd=1.5)

abline(v=xmax,col=3,lty=3)

#Step 6, verify lines are of oppposite sign and significanct

summary(lm1)