Chapter 11: Further Issues in Using OLS with Time Series Data

Franz X. Mohr, Created: October 7, 2018, Last update: October 7, 2018

library(wooldridge)

Example 11.4

For this regression the lagged values of return are already contained in the dataset. Thus, we do not have to calculated them ourselves and can simply run the regression.

data("nyse")

lm.11.4  <-  lm(return ~ return_1, data = nyse)
summary(lm.11.4)

## 
## Call:
## lm(formula = return ~ return_1, data = nyse)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.261  -1.302   0.098   1.316   8.065 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.17963    0.08074   2.225   0.0264 *
## return_1     0.05890    0.03802   1.549   0.1218  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.11 on 687 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.003481,   Adjusted R-squared:  0.00203 
## F-statistic: 2.399 on 1 and 687 DF,  p-value: 0.1218

Equation 11.17

To estimate this model we have to calculate the lagged values ourselves. First, I created a vector of the return values from the nyse data set. To create the series with the first lagged returns I omitted the first value in the list of returns of the nyse data set by adding [-1]. But since this causes the length of the resulting series to decrease by one observation, we have to add an NA so that R can estimate the model. I added this NA by creating a list with c() which contains the values from the first lag list “nyse$return[-1]” and put an NA at the end. For the second list of lagged values I proceeded similarly. I omitted the first and second observation from the return list of the nyse data set and added two NAs. The estimation works as usual.

# Create lagged values
return <- ts(nyse$return)
return1 <- lag(return, -1)
return2 <- lag(return, -2)

return_data <- cbind(return, return1, return2)

# Estimate
lm.e11.17  <-  lm(return ~ return1 + return2, data = return_data)
summary(lm.e11.17)

## 
## Call:
## lm(formula = return ~ return1 + return2, data = return_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2969  -1.3214   0.1099   1.3478   7.9832 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.18575    0.08115   2.289   0.0224 *
## return1      0.06032    0.03818   1.580   0.1146  
## return2     -0.03807    0.03814  -0.998   0.3185  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.112 on 685 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:  0.004819,   Adjusted R-squared:  0.001914 
## F-statistic: 1.659 on 2 and 685 DF,  p-value: 0.1912

Example 11.5

The estimations works as usual. The difference in the inflation rate is calculated within the lm() command.

data("phillips")

lm.11.15  <-  lm(I(inf-inf_1) ~ unem, data = phillips)
summary(lm.11.15)

## 
## Call:
## lm(formula = I(inf - inf_1) ~ unem, data = phillips)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.0741 -0.9241  0.0189  0.8606  5.4800 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   2.8282     1.2249   2.309   0.0249 *
## unem         -0.5176     0.2090  -2.476   0.0165 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.307 on 53 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.1037, Adjusted R-squared:  0.08679 
## F-statistic: 6.132 on 1 and 53 DF,  p-value: 0.0165

Natural rate of unemployment

lm.11.15$coeff[1] / -lm.11.15$coeff[2]

## (Intercept) 
##    5.463554

Example 11.6

data("fertil3")

cor(fertil3$gfr, fertil3$gfr_1, use = "pairwise.complete.obs")

## [1] 0.9764517

cor(fertil3$pe, fertil3$pe_1, use = "pairwise.complete.obs")

## [1] 0.96358

lm.11.16.1  <- lm(cgfr ~ cpe, data = fertil3)
summary(lm.11.16.1)

## 
## Call:
## lm(formula = cgfr ~ cpe, data = fertil3)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.980 -2.552 -0.377  1.866 14.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.78478    0.50204  -1.563    0.123
## cpe         -0.04268    0.02837  -1.504    0.137
## 
## Residual standard error: 4.221 on 69 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.03176,    Adjusted R-squared:  0.01773 
## F-statistic: 2.263 on 1 and 69 DF,  p-value: 0.137

lm.11.16.2  <-  lm(cgfr ~ cpe + cpe_1 + cpe_2, data = fertil3)
summary(lm.11.16.2)

## 
## Call:
## lm(formula = cgfr ~ cpe + cpe_1 + cpe_2, data = fertil3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8307 -2.1842 -0.1912  1.8442 11.4506 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.96368    0.46776  -2.060  0.04339 *  
## cpe         -0.03620    0.02677  -1.352  0.18101    
## cpe_1       -0.01397    0.02755  -0.507  0.61385    
## cpe_2        0.10999    0.02688   4.092  0.00012 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.859 on 65 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.2325, Adjusted R-squared:  0.1971 
## F-statistic: 6.563 on 3 and 65 DF,  p-value: 0.0006054

Joint significance of pe and pe_1

lm.11.16.2res  <-  lm(cgfr ~ cpe_2, data = fertil3)
summary(lm.11.16.2res)

## 
## Call:
## lm(formula = cgfr ~ cpe_2, data = fertil3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.6545 -1.8542 -0.0991  1.9755 13.0087 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.02322    0.46823  -2.185 0.032369 *  
## cpe_2        0.10782    0.02618   4.119 0.000107 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.876 on 67 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.202,  Adjusted R-squared:  0.1901 
## F-statistic: 16.96 on 1 and 67 DF,  p-value: 0.0001069

anova(lm.11.16.2, lm.11.16.2res)

## Analysis of Variance Table
## 
## Model 1: cgfr ~ cpe + cpe_1 + cpe_2
## Model 2: cgfr ~ cpe_2
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     65  968.2                           
## 2     67 1006.6 -2   -38.413 1.2894 0.2824

Example 11.7

data("earns")

lm.11.17.1  <-  lm(lhrwage ~ loutphr + t, data = earns)
summary(lm.11.17.1)

## 
## Call:
## lm(formula = lhrwage ~ loutphr + t, data = earns)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059230 -0.026151  0.002411  0.020322  0.051966 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.328454   0.374449  -14.23  < 2e-16 ***
## loutphr      1.639639   0.093347   17.57  < 2e-16 ***
## t           -0.018230   0.001748  -10.43 1.05e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02854 on 38 degrees of freedom
## Multiple R-squared:  0.9712, Adjusted R-squared:  0.9697 
## F-statistic: 641.2 on 2 and 38 DF,  p-value: < 2.2e-16

Detrend the variables

dtr.lhrwage  <-  lm(lhrwage ~ t, data = earns)$resid
dtr.loutphr  <-  lm(loutphr ~ t, data = earns)$resid
summary(lm(dtr.lhrwage ~ -1 + dtr.loutphr))

## 
## Call:
## lm(formula = dtr.lhrwage ~ -1 + dtr.loutphr)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.059230 -0.026151  0.002411  0.020322  0.051966 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## dtr.loutphr  1.63964    0.09098   18.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02782 on 40 degrees of freedom
## Multiple R-squared:  0.8903, Adjusted R-squared:  0.8876 
## F-statistic: 324.8 on 1 and 40 DF,  p-value: < 2.2e-16

cor(dtr.lhrwage, c(dtr.lhrwage[-1],NA), use = "pairwise.complete.obs")

## [1] 0.9671587

cor(dtr.loutphr, c(dtr.loutphr[-1],NA), use = "pairwise.complete.obs")

## [1] 0.9452925

The diff function calculates the difference between elements of a vector. By default, it assumes that the first difference between subsequent observations should be calculated.

lm.11.17.2 <- lm(I(diff(lhrwage)) ~ I(diff(loutphr)), data = earns)
summary(lm.11.17.2)

## 
## Call:
## lm(formula = I(diff(lhrwage)) ~ I(diff(loutphr)), data = earns)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.040921 -0.010165 -0.000383  0.007969  0.040329 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -0.003662   0.004220  -0.868    0.391    
## I(diff(loutphr))  0.809316   0.173454   4.666 3.75e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01695 on 38 degrees of freedom
## Multiple R-squared:  0.3642, Adjusted R-squared:  0.3475 
## F-statistic: 21.77 on 1 and 38 DF,  p-value: 3.748e-05

Example 11.8

data("fertil3")

lm.11.18  <-  lm(cgfr ~ cpe + cpe_1 + cpe_2 + cgfr_1, data = fertil3)
summary(lm.11.18)

## 
## Call:
## lm(formula = cgfr ~ cpe + cpe_1 + cpe_2 + cgfr_1, data = fertil3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7491 -2.2345  0.0776  1.7393  9.2857 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.702159   0.453799  -1.547 0.126724    
## cpe         -0.045472   0.025642  -1.773 0.080926 .  
## cpe_1        0.002064   0.026778   0.077 0.938800    
## cpe_2        0.105135   0.025590   4.108 0.000115 ***
## cgfr_1       0.300242   0.105903   2.835 0.006125 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.666 on 64 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.3181, Adjusted R-squared:  0.2755 
## F-statistic: 7.464 on 4 and 64 DF,  p-value: 5.336e-05

A significant coefficient on cgfr_1 suggests serial correlations in the errors