© 1972 by Biometrika Trust
On using an estimated regression line in a second sample
Medical Research Council's Social Medicine Unit
One of the uses of multiple regression equations is to predict the values of the dependent variable. The usual criterion for estimating the regression equation is that which minimizes the residual, error, sum of squares in the sample. If the regression equation is applied to a second sample it will not usually fulfil this condition, and a measure of the excess of the sum of squares about this line compared to the least squares equation in the second sample is proposed. Exact distributional properties are derived when the regressor variables have the same values in each sample, and the first two moments are derived for the multivariate normal case.
Key Words: Multiple regression Prediction Distribution of quadratic forms and ratios Inverse Wishart matrices