© 1998 by Biometrika Trust
Bandwidth selection for the smoothing of distribution functions
Department of Statistics, University of Glasgow Glasgow G12 8QQ, U.K.adrian{at}stats.gla.ac.uk
Centre for Mathematics and its Applications, Australian National University Canberra ACT 0200, Australiahalpstat{at}pretty.anu.edu.au
School of Mathematics and Statistics, University of Canberra Canberra, ACT 2601, Australiaprevan{at}ise.canberra.edu.au
Several approaches can be made to the choice of bandwidth in the kernel smoothing of distribution functions. Recent proposals by Sarda (1993) and by Altman & Leger (1995) are analogues of the leave-one-out and plug-in methods which have been widely used in density estimation. In contrast, a method of crossvalidation appropriate to the smoothing of distribution functions is proposed. Selection of the bandwidth parameter is based on unbiased estimation of a mean integrated squared error curve whose minimising value defines an optimal smoothing parameter. This procedure is shown to lead to asymptotically optimal bandwidth choice, not just in the usual first-order sense but also in the second-order sense in which kernel methods improve on the standard empirical distribution function. Some general theory on the performance of optimal, data-based methods of bandwidth choice is also provided, leading to results which do not have analogues in the context of density estimation. The numerical performances of all the methods discussed in the paper are compared. A bandwidth based on a simple reference distribution is also included. Simulations suggest that the crossvalidatory proposal works well, although the simple reference bandwidth is also quite effective.
Key Words: Crossvalidation Empirical distribution function Kernel Smoothing