Colada[27]: Step-by-step instruction to fit two lines to an observed u-shaped

To fit two straight lines we need to identify the point where one line ends and the other begins.
In the post we consider using the quadratic everyone is using for testing u-shapes to do so. We provide instructions for doing so below.
There are other options for identifying the interruption point that do not involve the quadratic, and may have some advantages, we do not consider them here.

Six Steps
Step 1. Plot the raw data.
Step 2. Estimate the quadratic regression y=a*x+b*x2
Step 3. Verify the results imply a u-shape within the range of observed data (e.g., plot the quadratic)
Step 4. If u-shape is seen, identify the value of x where u-shape maxes out. Call that point xmax. Simple calculus gets you that that point is

    xmax= -a/2b

Step 5. Create new variables to allow an interrupted regression.
We will have a variable for low values of x (xlow), one for high values of x (xhigh) and a new dummy (high).

xlow =x-xmax if x<=xmax, 0 otherwise
xhigh=x-xmax if x>xmax,   0 otherwise
high=1            if x>xmax,   0 otherwise.

That sounds more complicated than it is, imagine x=1,2,3,4,5,6,7,8,9,10 and  xmax=5, the data would look like this:
x xlow xhigh high
1 -4 0 0
2 -3 0 0
3 -2 0 0
4 -1 0 0
5 0 0 0
6 0 1 1
7 0 2 1
8 0 3 1
9 0 4 1
10 0 5 1


Step 6. Run the new regression y=c*xlow+d*xhigh+e*high

If c and d have opposite sign, there is a u-shape.
If both are p<.05, there is a statistically significant u-shape.


Sample R Code

#DEMONSTRATION OF FITTING THE TWO LINES TO A QUADRATIC
 
#Generate x
  x=sort(runif(n=300))
  x2=x*x

#Genererate error term
  e=rnorm(n=300,sd=.05)
   
#Generate y
  y=x-x2+e
 
#Step 1 Plot data
  plot(x,y)
 
#Step 2 - Run the quadratic
  bs=summary(lm(y~x+x2))$coefficients
 
#Step 3 - Find the point where it maxes out, which is -2b/a where
# y=ax+bx2
  a=bs[2,1]      #This is the linear coefficient in the regression results
  b=bs[3,1]      #This is the quadratic coefficient in the regression results
  xmax=-a/(2*b)  #This is the point where the (inverted) u-shape takes its (maximum) minimum value
 
#Step 4- Generate new predictors with interruption at xmax
  #xlow=x-xmax when x<xmax, 0 otherwise
    xlow=ifelse(x<=xmax,x-xmax,0)
  #xhigh=x when x<xmax, 0 otherwise
    xhigh=ifelse(x>xmax,x-xmax,0)
  #high dummy
    high=ifelse(x>xmax,1,0)
 
#Step 5 - Run the interrupted regression
    lm1=lm(y~xlow+xhigh+high)
    f1=fitted(lm1)   
 
#Plot the two lines
  plot(x,y,main="Two Straight lines",cex.main=1.5,cex=.75)
  lines(x[x<xmax],f1[x<xmax],col='red',lwd=3)
  lines(x[x>xmax],f1[x>xmax],col='red',lwd=3)
  lines(x,fitted(lm(y~x+x2)),col="blue",lwd=1.5)
  abline(v=xmax,col=3,lty=3)
 
#Step 6, verify lines are of oppposite sign and significanct
  summary(lm1)