Chapter 8: Heteroskedasticity

Franz Mohr, Created: December 1, 2018, Last update: December 1, 2018

library(wooldridge)

Example 8.1

After loading the data and generating the dummy variables from chapter 7 estimate the model in the already familiar way.

data("wage1")

marrmale <- as.numeric(wage1$female == 0 & wage1$married == 1)
marrfem <- as.numeric(wage1$female == 1 & wage1$married == 1)
singfem <- as.numeric(wage1$female == 1 & wage1$married == 0)

lm.8.1 <- lm(lwage ~ marrmale + marrfem + singfem + educ + exper + expersq + tenure + tenursq, data = wage1)

summary(lm.8.1)

## 
## Call:
## lm(formula = lwage ~ marrmale + marrfem + singfem + educ + exper + 
##     expersq + tenure + tenursq, data = wage1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89697 -0.24060 -0.02689  0.23144  1.09197 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.3213781  0.1000090   3.213 0.001393 ** 
## marrmale     0.2126757  0.0553572   3.842 0.000137 ***
## marrfem     -0.1982676  0.0578355  -3.428 0.000656 ***
## singfem     -0.1103502  0.0557421  -1.980 0.048272 *  
## educ         0.0789103  0.0066945  11.787  < 2e-16 ***
## exper        0.0268006  0.0052428   5.112 4.50e-07 ***
## expersq     -0.0005352  0.0001104  -4.847 1.66e-06 ***
## tenure       0.0290875  0.0067620   4.302 2.03e-05 ***
## tenursq     -0.0005331  0.0002312  -2.306 0.021531 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3933 on 517 degrees of freedom
## Multiple R-squared:  0.4609, Adjusted R-squared:  0.4525 
## F-statistic: 55.25 on 8 and 517 DF,  p-value: < 2.2e-16

The function coeftest from the lmtest package can be used to obtain the heteroskedasticity robust standard errors. The first argument of the function contains the result of the original estimation, i.e. lm.8.1.. The second argument tells R how to calculate the heteroskedasticity robust standard errors. Basically, you could just enter the first part and R would do the rest. However, since the standard procedure of coeftest would not give the same results as in the book, we have to specify them a bit. Thus, we use the vcovHC function from the sandwich package, which requires the output of the estimated model and a specification of the type of robust standard errors that should be calculated. In our case we want a simple White standard error, which is indicated by type = "HC0". Other, more sophisticated methods are described in the documentation of the function.

library("lmtest")

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library("sandwich")

coeftest(lm.8.1, vcov=vcovHC(lm.8.1, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  0.32137808  0.10852843  2.9612 0.0032049 ** 
## marrmale     0.21267568  0.05665095  3.7541 0.0001937 ***
## marrfem     -0.19826760  0.05826505 -3.4029 0.0007186 ***
## singfem     -0.11035021  0.05662551 -1.9488 0.0518632 .  
## educ         0.07891028  0.00735096 10.7347 < 2.2e-16 ***
## exper        0.02680057  0.00509497  5.2602 2.111e-07 ***
## expersq     -0.00053525  0.00010543 -5.0770 5.360e-07 ***
## tenure       0.02908752  0.00688128  4.2270 2.800e-05 ***
## tenursq     -0.00053314  0.00024159 -2.2068 0.0277670 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 8.2

data("gpa3")

lm.8.2 <- lm(cumgpa ~ sat + hsperc + tothrs + female + black + white,
             data = gpa3, subset = (term == 2))

summary(lm.8.2)

## 
## Call:
## lm(formula = cumgpa ~ sat + hsperc + tothrs + female + black + 
##     white, data = gpa3, subset = (term == 2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.54320 -0.29104 -0.02252  0.28348  1.24872 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.4700648  0.2298031   6.397 4.94e-10 ***
## sat          0.0011407  0.0001786   6.389 5.20e-10 ***
## hsperc      -0.0085664  0.0012404  -6.906 2.27e-11 ***
## tothrs       0.0025040  0.0007310   3.426 0.000685 ***
## female       0.3034333  0.0590203   5.141 4.50e-07 ***
## black       -0.1282837  0.1473701  -0.870 0.384616    
## white       -0.0587217  0.1409896  -0.416 0.677295    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4693 on 359 degrees of freedom
## Multiple R-squared:  0.4006, Adjusted R-squared:  0.3905 
## F-statistic: 39.98 on 6 and 359 DF,  p-value: < 2.2e-16

coeftest(lm.8.2, vcov = vcovHC(lm.8.2, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  1.47006477  0.21855969  6.7261 6.888e-11 ***
## sat          0.00114073  0.00018969  6.0136 4.468e-09 ***
## hsperc      -0.00856636  0.00140430 -6.1001 2.744e-09 ***
## tothrs       0.00250400  0.00073353  3.4136 0.0007142 ***
## female       0.30343329  0.05856959  5.1807 3.693e-07 ***
## black       -0.12828368  0.11809549 -1.0863 0.2780880    
## white       -0.05872173  0.11032164 -0.5323 0.5948631    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For a heteroskedasticity robust F-test we perform a Wald-test. The procedure is similar to obtaining the coefficients’ standard errors. For the usual F-test estimate the restricted and unrestricted models and put their results into the anova function, which will print the F-statistic. For the Wald test we do the same, but this time we put the estimation results into the waldtest command and, as a third part of the function, you specify the method used to model heteroskedasticity.

lm.8.2.2 <- lm(cumgpa ~ sat + hsperc + tothrs + female, data = gpa3, subset = (term==2))

# Usual F-statistic
anova(lm.8.2, lm.8.2.2)

## Analysis of Variance Table
## 
## Model 1: cumgpa ~ sat + hsperc + tothrs + female + black + white
## Model 2: cumgpa ~ sat + hsperc + tothrs + female
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    359 79.062                           
## 2    361 79.362 -2  -0.29934 0.6796 0.5075

Heteroskedasticity robust F-statistic

waldtest(lm.8.2, lm.8.2.2, vcov = vcovHC(lm.8.2, type = "HC0"))

## Wald test
## 
## Model 1: cumgpa ~ sat + hsperc + tothrs + female + black + white
## Model 2: cumgpa ~ sat + hsperc + tothrs + female
##   Res.Df Df      F Pr(>F)
## 1    359                 
## 2    361 -2 0.7478 0.4741

Example 8.3

The first part of the example is similar to above. Estimate the model and print its summary and the heteroskedasticity robust standard errors.

# Load data and calculate the squared variable
data("crime1")

avgsensq <- crime1$avgsen^2

lm.8.3 <- lm(narr86 ~ pcnv + avgsen + avgsensq + ptime86 + qemp86 +
               inc86 + black + hispan, data = crime1)

summary(lm.8.3)

## 
## Call:
## lm(formula = narr86 ~ pcnv + avgsen + avgsensq + ptime86 + qemp86 + 
##     inc86 + black + hispan, data = crime1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0008 -0.4515 -0.2391  0.2686 11.5307 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.5670128  0.0360573  15.725  < 2e-16 ***
## pcnv        -0.1355954  0.0403699  -3.359 0.000794 ***
## avgsen       0.0178411  0.0096960   1.840 0.065872 .  
## avgsensq    -0.0005163  0.0002970  -1.738 0.082265 .  
## ptime86     -0.0393600  0.0086935  -4.528 6.23e-06 ***
## qemp86      -0.0505072  0.0144345  -3.499 0.000474 ***
## inc86       -0.0014797  0.0003405  -4.345 1.44e-05 ***
## black        0.3246024  0.0454188   7.147 1.14e-12 ***
## hispan       0.1933800  0.0397035   4.871 1.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8284 on 2716 degrees of freedom
## Multiple R-squared:  0.0728, Adjusted R-squared:  0.07007 
## F-statistic: 26.66 on 8 and 2716 DF,  p-value: < 2.2e-16

coeftest(lm.8.3, vcov = vcovHC(lm.8.3, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  0.56701278  0.04020901 14.1016 < 2.2e-16 ***
## pcnv        -0.13559539  0.03356622 -4.0396 5.502e-05 ***
## avgsen       0.01784106  0.01010659  1.7653 0.0776275 .  
## avgsensq    -0.00051633  0.00020735 -2.4902 0.0128281 *  
## ptime86     -0.03935998  0.00621327 -6.3348 2.772e-10 ***
## qemp86      -0.05050717  0.01417805 -3.5624 0.0003739 ***
## inc86       -0.00147966  0.00022913 -6.4576 1.256e-10 ***
## black        0.32460243  0.05841683  5.5567 3.017e-08 ***
## hispan       0.19338004  0.04023169  4.8067 1.618e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the LM-test estimate the restricted model and regress its residuals on the variables of the unrestricted model. Obtain R-Squared and multiply it by the number of used observations to get the LM-value. Last, get the p-value.

lm.8.3.2 <- lm(narr86 ~ pcnv + ptime86 + qemp86 + inc86 +
                 black + hispan, data = crime1)

lm.8.3.2u <- lm(lm.8.3.2$residuals ~ pcnv + avgsen + avgsensq + 
                  ptime86 + qemp86 + inc86 + black + hispan, data = crime1)

summary(lm.8.3.2u)$r.squared * 2725

## [1] 3.462601

1 - pchisq(3.4626, 2)

## [1] 0.1770541

Example 8.4

For the Breusch-Pagan test estimate the model and obtain its squared residuals. Regress the latter on the variables of the model.

data("hprice1")

lm.e8.17 <- lm(price ~ lotsize + sqrft + bdrms, data = hprice1)
summary(lm.e8.17)

## 
## Call:
## lm(formula = price ~ lotsize + sqrft + bdrms, data = hprice1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -120.026  -38.530   -6.555   32.323  209.376 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.177e+01  2.948e+01  -0.739  0.46221    
## lotsize      2.068e-03  6.421e-04   3.220  0.00182 ** 
## sqrft        1.228e-01  1.324e-02   9.275 1.66e-14 ***
## bdrms        1.385e+01  9.010e+00   1.537  0.12795    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59.83 on 84 degrees of freedom
## Multiple R-squared:  0.6724, Adjusted R-squared:  0.6607 
## F-statistic: 57.46 on 3 and 84 DF,  p-value: < 2.2e-16

# Squared residuals
usq.17 <- (summary(lm.e8.17)$residual)^2 
lm.e8.17u <- lm(usq.17 ~ lotsize + sqrft + bdrms, data = hprice1)
summary(lm.e8.17u)

## 
## Call:
## lm(formula = usq.17 ~ lotsize + sqrft + bdrms, data = hprice1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9044  -2212  -1256    -97  42582 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -5.523e+03  3.259e+03  -1.694  0.09390 . 
## lotsize      2.015e-01  7.101e-02   2.838  0.00569 **
## sqrft        1.691e+00  1.464e+00   1.155  0.25128   
## bdrms        1.042e+03  9.964e+02   1.046  0.29877   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6617 on 84 degrees of freedom
## Multiple R-squared:  0.1601, Adjusted R-squared:  0.1301 
## F-statistic: 5.339 on 3 and 84 DF,  p-value: 0.002048

# LM-test
lm.e8.17u.2 <- lm(usq.17 ~ 1, data = hprice1)
lm.e8.17u.2u <- lm(summary(lm.e8.17u.2)$residuals ~ lotsize + sqrft + bdrms, data = hprice1)

summary(lm.e8.17u.2u)$r.squared * 88

## [1] 14.09239

1 - pchisq(14.09239, 3)

## [1] 0.002782054

# Using logarithms
lm.e8.18 <- lm(lprice ~ llotsize + lsqrft + bdrms, data = hprice1)
summary(lm.e8.18)

## 
## Call:
## lm(formula = lprice ~ llotsize + lsqrft + bdrms, data = hprice1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.68422 -0.09178 -0.01584  0.11213  0.66899 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.29704    0.65128  -1.992   0.0497 *  
## llotsize     0.16797    0.03828   4.388 3.31e-05 ***
## lsqrft       0.70023    0.09287   7.540 5.01e-11 ***
## bdrms        0.03696    0.02753   1.342   0.1831    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1846 on 84 degrees of freedom
## Multiple R-squared:  0.643,  Adjusted R-squared:  0.6302 
## F-statistic: 50.42 on 3 and 84 DF,  p-value: < 2.2e-16

# Breusch-Pagan test
usq.18 <- (summary(lm.e8.18)$residual)^2
lm.e8.18u <- lm(usq.18 ~ llotsize + lsqrft + bdrms, data = hprice1)
summary(lm.e8.18u)

## 
## Call:
## lm(formula = usq.18 ~ llotsize + lsqrft + bdrms, data = hprice1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.05601 -0.03011 -0.01687  0.00523  0.40978 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.509994   0.257857   1.978   0.0512 .
## llotsize    -0.007016   0.015156  -0.463   0.6446  
## lsqrft      -0.062737   0.036767  -1.706   0.0916 .
## bdrms        0.016841   0.010900   1.545   0.1261  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07309 on 84 degrees of freedom
## Multiple R-squared:  0.04799,    Adjusted R-squared:  0.01399 
## F-statistic: 1.412 on 3 and 84 DF,  p-value: 0.2451

# LM-test
summary(lm.e8.18u)$r.squared * 88

## [1] 4.223248

1 - pchisq(4.223248, 3)

## [1] 0.2383446

Example 8.5

To obtain the test statistic of the the White test, estimate the model, obtain its squared residuals, fitted values and squared fitted values and regress the first on the latter ones. Then, get the R-Squared and multiply it by the number of used observations to get the LM-statistic and the p-value.

data("hprice1")

lm.e8.18 <- lm(lprice ~ llotsize + lsqrft + bdrms, data = hprice1)
ressq <- lm.e8.18$residuals^2
fitted <- lm.e8.18$fitted.values
fittedsq <- lm.e8.18$fitted.values^2
rsq <- summary(lm(ressq ~ fitted + fittedsq))$r.squared

rsq * 88

## [1] 3.447286

1 - pchisq(3.447286, 2)

## [1] 0.178415

Example 8.6

For weighted least squares you can still use R’s lm command, but you have to tell R how to weigh the observations with respect to the variance of the coefficients. This is achieved by specifying the argument weights as 1/h, where in our example h is income. Now the model can be estimated and since we have controlled for heteroskedasticity by specifying weights, we can use the usual summary command to get heteroskedasticity robust results and do not need to use coeftest.

data("k401ksubs")

lm.8.6.1 <- lm(nettfa ~ inc, data = k401ksubs, subset = (fsize == 1))
coeftest(lm.8.6.1, vcov = vcovHC(lm.8.6.1, type = "HC0"))

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -10.57095    2.52902 -4.1799 3.042e-05 ***
## inc           0.82068    0.10354  7.9261 3.711e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lm.8.6.2 <- lm(nettfa ~ inc, weights = 1/inc, data = k401ksubs, subset = (fsize == 1))
summary(lm.8.6.2)

## 
## Call:
## lm(formula = nettfa ~ inc, data = k401ksubs, subset = (fsize == 
##     1), weights = 1/inc)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.469  -2.339  -1.086   0.352 178.220 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -9.58070    1.65328  -5.795 7.91e-09 ***
## inc          0.78705    0.06348  12.398  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.219 on 2015 degrees of freedom
## Multiple R-squared:  0.07088,    Adjusted R-squared:  0.07042 
## F-statistic: 153.7 on 1 and 2015 DF,  p-value: < 2.2e-16

age.25sq <- (k401ksubs$age - 25)^2
lm.8.6.3 <- lm(nettfa ~ age.25sq + male + e401k,
               data = k401ksubs, subset = (fsize == 1))

coeftest(lm.8.6.3, vcov = vcovHC(lm.8.6.3, type = "HC0"))

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -2.3360343  1.8725857 -1.2475   0.21236    
## age.25sq     0.0258052  0.0044892  5.7483 1.039e-08 ***
## male         5.3883568  2.2771296  2.3663   0.01806 *  
## e401k       13.1199818  2.4341139  5.3900 7.870e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

lm.8.6.4 <- lm(nettfa ~ inc + age.25sq + male + e401k, weights = 1 / inc,
               data = k401ksubs, subset = (fsize == 1))
summary(lm.8.6.4)

## 
## Call:
## lm(formula = nettfa ~ inc + age.25sq + male + e401k, data = k401ksubs, 
##     subset = (fsize == 1), weights = 1/inc)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.613  -2.491  -0.803   0.934 178.052 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -16.702521   1.957995  -8.530  < 2e-16 ***
## inc           0.740384   0.064303  11.514  < 2e-16 ***
## age.25sq      0.017537   0.001931   9.080  < 2e-16 ***
## male          1.840529   1.563587   1.177  0.23929    
## e401k         5.188281   1.703426   3.046  0.00235 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.065 on 2012 degrees of freedom
## Multiple R-squared:  0.1115, Adjusted R-squared:  0.1097 
## F-statistic: 63.13 on 4 and 2012 DF,  p-value: < 2.2e-16

Example 8.7

data("smoke")

lm.8.7 <- lm(cigs ~ lincome + lcigpric + educ + age + agesq + restaurn, data = smoke)
summary(lm.8.7)

## 
## Call:
## lm(formula = cigs ~ lincome + lcigpric + educ + age + agesq + 
##     restaurn, data = smoke)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.819  -9.381  -5.975   7.922  70.221 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.639841  24.078660  -0.151  0.87988    
## lincome      0.880268   0.727783   1.210  0.22682    
## lcigpric    -0.750859   5.773343  -0.130  0.89655    
## educ        -0.501498   0.167077  -3.002  0.00277 ** 
## age          0.770694   0.160122   4.813 1.78e-06 ***
## agesq       -0.009023   0.001743  -5.176 2.86e-07 ***
## restaurn    -2.825085   1.111794  -2.541  0.01124 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.4 on 800 degrees of freedom
## Multiple R-squared:  0.05274,    Adjusted R-squared:  0.04563 
## F-statistic: 7.423 on 6 and 800 DF,  p-value: 9.499e-08

# Breusch-Pagan
summary(lm.8.7u <-lm(lm.8.7$residuals^2 ~ lincome + lcigpric +
                       educ + age + agesq + restaurn, data = smoke))

## 
## Call:
## lm(formula = lm.8.7$residuals^2 ~ lincome + lcigpric + educ + 
##     age + agesq + restaurn, data = smoke)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -270.1 -127.5  -94.0  -39.1 4667.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -636.30306  652.49456  -0.975   0.3298    
## lincome       24.63848   19.72180   1.249   0.2119    
## lcigpric      60.97655  156.44869   0.390   0.6968    
## educ          -2.38423    4.52753  -0.527   0.5986    
## age           19.41748    4.33907   4.475 8.75e-06 ***
## agesq         -0.21479    0.04723  -4.547 6.27e-06 ***
## restaurn     -71.18138   30.12789  -2.363   0.0184 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 363.2 on 800 degrees of freedom
## Multiple R-squared:  0.03997,    Adjusted R-squared:  0.03277 
## F-statistic: 5.552 on 6 and 800 DF,  p-value: 1.189e-05

summary(lm.8.7u)$r.squared * 807

## [1] 32.25842

1 - pchisq(summary(lm.8.7u)$r.squared*807, 6)

## [1] 1.455779e-05

For feasible GLS you obtain the logarithm of squared residuals from the basic model. Then you regress it on the model’s independent variables and calculate the exponential of fitted values from that regression which gives hhat. Then you estimate the basic model again, but with the weights 1/hhat.

lres.u <- log(lm.8.7$residuals^2)
lm.8.7gls.u <- lm(lres.u ~ lincome + lcigpric + educ + age +
                    agesq + restaurn, data = smoke)
hhat <- exp(lm.8.7gls.u$fitted.values)

lm.8.7gls <- lm(cigs ~ lincome + lcigpric + educ + age + 
                  agesq + restaurn, weights = 1 / hhat, data = smoke)
summary(lm.8.7gls)

## 
## Call:
## lm(formula = cigs ~ lincome + lcigpric + educ + age + agesq + 
##     restaurn, data = smoke, weights = 1/hhat)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9036 -0.9532 -0.8099  0.8415  9.8556 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.6354618 17.8031385   0.317 0.751673    
## lincome      1.2952399  0.4370118   2.964 0.003128 ** 
## lcigpric    -2.9403123  4.4601445  -0.659 0.509930    
## educ        -0.4634464  0.1201587  -3.857 0.000124 ***
## age          0.4819479  0.0968082   4.978 7.86e-07 ***
## agesq       -0.0056272  0.0009395  -5.990 3.17e-09 ***
## restaurn    -3.4610641  0.7955050  -4.351 1.53e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.579 on 800 degrees of freedom
## Multiple R-squared:  0.1134, Adjusted R-squared:  0.1068 
## F-statistic: 17.06 on 6 and 800 DF,  p-value: < 2.2e-16

Table 8.2

# Recall the first column of the table
summary(lm.8.6.4)

## 
## Call:
## lm(formula = nettfa ~ inc + age.25sq + male + e401k, data = k401ksubs, 
##     subset = (fsize == 1), weights = 1/inc)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -26.613  -2.491  -0.803   0.934 178.052 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -16.702521   1.957995  -8.530  < 2e-16 ***
## inc           0.740384   0.064303  11.514  < 2e-16 ***
## age.25sq      0.017537   0.001931   9.080  < 2e-16 ***
## male          1.840529   1.563587   1.177  0.23929    
## e401k         5.188281   1.703426   3.046  0.00235 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.065 on 2012 degrees of freedom
## Multiple R-squared:  0.1115, Adjusted R-squared:  0.1097 
## F-statistic: 63.13 on 4 and 2012 DF,  p-value: < 2.2e-16

# Column 2
coeftest(lm.8.6.4, vcov = vcovHC(lm.8.6.4, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept) -16.7025205   2.2402025 -7.4558 1.320e-13 ***
## inc           0.7403843   0.0749626  9.8767 < 2.2e-16 ***
## age.25sq      0.0175373   0.0025817  6.7930 1.442e-11 ***
## male          1.8405293   1.3091836  1.4059  0.159920    
## e401k         5.1882807   1.5699057  3.3048  0.000967 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 8.8

With respect to linear probability models the procedure is rather similar to the first examples: estimate the model and use coeftest to get heteroskedasticity robust estimates of the standard errors.

data("mroz")

lm.8.8 <- lm(inlf ~ nwifeinc + educ + exper + expersq +
               age + kidslt6 + kidsge6, data = mroz)

summary(lm.8.8)

## 
## Call:
## lm(formula = inlf ~ nwifeinc + educ + exper + expersq + age + 
##     kidslt6 + kidsge6, data = mroz)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.93432 -0.37526  0.08833  0.34404  0.99417 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.5855192  0.1541780   3.798 0.000158 ***
## nwifeinc    -0.0034052  0.0014485  -2.351 0.018991 *  
## educ         0.0379953  0.0073760   5.151 3.32e-07 ***
## exper        0.0394924  0.0056727   6.962 7.38e-12 ***
## expersq     -0.0005963  0.0001848  -3.227 0.001306 ** 
## age         -0.0160908  0.0024847  -6.476 1.71e-10 ***
## kidslt6     -0.2618105  0.0335058  -7.814 1.89e-14 ***
## kidsge6      0.0130122  0.0131960   0.986 0.324415    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4271 on 745 degrees of freedom
## Multiple R-squared:  0.2642, Adjusted R-squared:  0.2573 
## F-statistic: 38.22 on 7 and 745 DF,  p-value: < 2.2e-16

coeftest(lm.8.8, vcov = vcovHC(lm.8.8, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept)  0.58551922  0.15144889  3.8661 0.0001202 ***
## nwifeinc    -0.00340517  0.00151681 -2.2450 0.0250635 *  
## educ         0.03799530  0.00722734  5.2572 1.913e-07 ***
## exper        0.03949239  0.00577907  6.8337 1.722e-11 ***
## expersq     -0.00059631  0.00018899 -3.1552 0.0016683 ** 
## age         -0.01609081  0.00238623 -6.7432 3.108e-11 ***
## kidslt6     -0.26181047  0.03161391 -8.2815 5.626e-16 ***
## kidsge6      0.01301223  0.01346085  0.9667 0.3340215    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 8.9

Since linear probability models automatically suffer from heteroskedasticity, we have to somehow control for that. The first part of the estimation is already familiar.

data("gpa1")

# Generate dummy variable vector which is 1 when either the 
# father or the mother or both were at college.
parcoll <- as.numeric(gpa1$fathcoll == 1 | gpa1$mothcoll)

lm.8.9 <- lm(PC ~ hsGPA + ACT + parcoll, data = gpa1)

summary(lm.8.9)

## 
## Call:
## lm(formula = PC ~ hsGPA + ACT + parcoll, data = gpa1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4915 -0.4494 -0.2437  0.5375  0.8223 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.0004322  0.4905358  -0.001   0.9993  
## hsGPA        0.0653943  0.1372576   0.476   0.6345  
## ACT          0.0005645  0.0154967   0.036   0.9710  
## parcoll      0.2210541  0.0929570   2.378   0.0188 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.486 on 137 degrees of freedom
## Multiple R-squared:  0.04153,    Adjusted R-squared:  0.02054 
## F-statistic: 1.979 on 3 and 137 DF,  p-value: 0.1201

coeftest(lm.8.9, vcov = vcovHC(lm.8.9, type = "HC0"))

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) -0.00043220  0.48879587 -0.0009  0.99930  
## hsGPA        0.06539435  0.13946545  0.4689  0.63989  
## ACT          0.00056451  0.01584133  0.0356  0.97163  
## parcoll      0.22105405  0.08678002  2.5473  0.01196 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To deal with heteroskedasticity obtain the fitted values from the basic model and check whether all of them are neither negative nor above unity. I plotted a histogram for that. You might also lock at min and max or functions like that. Then calculate hhat as described in the book and use its inverse (1/hhat) for the weights specification in the final regression.

hist(lm.8.9$fitted.values) # Fitted values are neither negative nor above unity.

hhat <- lm.8.9$fitted.values * (1 - lm.8.9$fitted.values) # Calculate hhat (h = yhat * (1 - yhat))
lm.8.9wls <- lm(PC ~ hsGPA + ACT + parcoll, weights = 1 / hhat, data = gpa1)
summary(lm.8.9wls)

## 
## Call:
## lm(formula = PC ~ hsGPA + ACT + parcoll, data = gpa1, weights = 1/hhat)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0015 -0.9029 -0.5576  1.0800  2.0429 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 0.026210   0.476650   0.055   0.9562  
## hsGPA       0.032703   0.129882   0.252   0.8016  
## ACT         0.004272   0.015453   0.276   0.7826  
## parcoll     0.215186   0.086292   2.494   0.0138 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.016 on 137 degrees of freedom
## Multiple R-squared:  0.04644,    Adjusted R-squared:  0.02556 
## F-statistic: 2.224 on 3 and 137 DF,  p-value: 0.08816

For a further very interesting article on dealing with heteroskedasticity in R I recommend this page.