An important problem in modeling the response variable is that of detecting lack of fit of a model of the form E(Y) = X(beta), when the true model is of the form E(Y = X(beta) + (delta). The lack of fit procedure for replicated data is well known. When replicate measurements are not available, the "near neighbor" procedure can be used as a generalization of the usual test for lack of fit. This is based upon forming groups of response observations, which are "near" in the space of the predictor variables, to obtain an estimate of (sigma)('2).
The effect that different groupings have on a near neighbor lack of fit testing procedure and its power is investigated. A clustering algorithm is used to form near neighbor cells. Most clustering algorithms stop clustering when a limit to the value of similarity, such as the distance between two observations, is provided or when a specified number of combinations has been made. A method of selecting a grouping, so that the power of the test is maximized against certain important alternatives, has been examined. Different ways of forming near neighbor groups are also suggested. The issue of using only the X-space to form near neighbor cells as against using the (X,Y) space has been investigated.
A new lack of fit testing procedure using near neighbors has been suggested along with a grouping procedure that is consistent with the test statistic used.