This study compared various biased estimation and subset selection regression techniques over a wide range of realistic situations. In an attempt to make the results applicable to the class of all practical regression problems, the parameters relevant to a comparison of the techniques involved were varied systematically. An outline of the design follows. (1) The residual variance ((sigma)('2)) was set equal to one for simplicity. (2) The overall signal, r('2) = (beta)'(beta), took on values from 10 to 10,000. (3) The number of independent variables took on the values 6, 10, and 19.
The expected proportion of the independent variables which were superfluous varied from 0 to .8.
For certain combinations of the above variables, additional runs were made which varied the degree of collinearity, the sample size (n), or the significance level of the subset tests.
For each combination (epsilon) and (beta) were randomly generated 100 or 500 times and the dependent variable (Y) was obtained by Y = X(beta) + (epsilon), where X is the matrix of the independent variables and (epsilon) is random noise. For each replication measures of estimation and prediction accuracy were made for all techniques studied.
The major conclusions of the simulation follow. (1) The Efroymson stepwise technique was very erratic relative to least squares, especially for large r('2), small percentage of extraneous variables, and small level of significance. (2) Neither biased estimation nor subset selection demonstrated a superiority over the other, in general, excluding principal components and stepwise. The relative effectiveness of the two strategies depended heavily on r('2) and percent of extraneous variables. (3) Principal component regression faired badly, demonstrating little or no advantage over least squares. (4) Subset selection methods employing a single ridge solution proved quick, effective, and stable alternatives to stepwise regression. (5) Significance levels less than .20 resulted in erratic performance for all subset methods, stepwise in particular. (6) The results concerning the biased estimation techniques studied agreed strongly with the results of major simulations previously seen in the statistical literature.